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Abstract 

We study the computational complexity of approximating the 2-to-q norm of linear 
operators (defined as ||A||2^fy = max„^ol|At'll?/l|f|l2) for ^ > 2, as well as connections 
between this question and issues arising in quantum information theory and the study 
of Khot's Unique Games Conjecture (UGC). We show the following: 

1 . For any constant even integer ^ > 4, a graph G is a small-set expander if and 
only if the projector into the span of the top eigenvectors of G's adjacency matrix 
has bounded 2 — > ^ norm. As a corollary, a good approximation to the 2 ^ q 
norm will refute the Small-Set Expansion Conjecture — a close variant of the 
UGC. We also show that such a good approximation can be obtained in exp(n^^*) 
time, thus obtaining a different proof of the known subexponential algorithm for 
Small-Set Expansion. 

2. Constant rounds of the "Sum of Squares" semidefinite programing hierarchy 
certify an upper bound on the 2 — » 4 norm of the projector to low-degree poly- 
nomials over the Boolean cube, as well certify the unsatisfiability of the "noisy 
cube" and "short code" based instances of Unique Games considered by prior 
works. This improves on the previous upper bound of exp(log''*'' n) rounds (for 
the "short code"), as well as separates the "Sum of Squares"/"Lasserre" hierar- 
chy from weaker hierarchies that were known to require w(l) rounds. 

3. We show reductions between computing the 2 — » 4 norm and computing the 
injective tensor norm of a tensor, a problem with connections to quantum infor- 
mation theory. Three corollaries are: (i) the 2 — > 4 norm is NP-hard to approx- 
imate to precision inverse-polynomial in the dimension, (ii) the 2 — > 4 norm 
does not have a good approximation (in the sense above) unless 3-SAT can be 
solved in time exp( Vnpoly log(n)), and (iii) known algorithms for the quantum 
separability problem imply a non-trivial additive approximation for the 2^4 
norm. 
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1 Introduction 



For a function /: Q ^ R on a (finite) probability space O, the p-norm is defined as ||/||p = 
fP)^IP } The p ^ q norm \\A\\p^q of a linear operator A between vector spaces of 
such functions is the smallest number c > such that ||A/||^ < c||/||p for all functions / 
in the domain of A. We also define the p ^ q norm of a subspace V to be the maximum 
of for / £ note that for = 2 this is the same as the norm of the projector 

operator into V. 

In this work, we are interested in the case p < q and we will call such p ^ q norms 
hypercontractive? Roughly speaking, for p < q, a. function / with large \\f\\q compared 
to ll/llp can be thought of as "spiky" or somewhat sparse (i.e., much of the mass concen- 
trated in small portion of the entries). Hence finding a function / in a linear subspace V 
maximizing ||/||i,/||/||2 for some q > 2 can be thought of as a geometric analogue of the 
problem finding the shortest word in a linear code. This problem is equivalent to computing 
the 2 ^ q norm of the projector P into V (since WPfWi < ll/lb)- Also when A is a nor- 
malized adjacency matrix of a graph (or more generally a Markov operator), upper bounds 
on the p ^ q norm are known as mixed-norm, Nash or hypercontractive inequalities and 
can be used to show rapid mixing of the corresponding random walk (e.g., see the surveys 
[Gro75, SC97]). Such bounds also have many applications to theoretical computer science, 
which are described in the survey [Bisl 1]. 

However, very little is known about the complexity of computing these norms. This is 
in contrast to the case of p — > ^ norms for p > q, where much more is known both in terms 
of algorithms and lower bounds, see [Ste05, KNS08, BVll]. 

2 Our Results 

We initiate a study of the computational complexity of approximating the 2 — > 4 (and 
more generally 2 — > g for g > 2) norm. While there are still many more questions than 
answers on this topic, we are able to show some new algorithmic and hardness results, as 
well as connections to both Khot's unique games conjecture [Kho02] (UGC) and questions 
from quantum information theory. In particular our paper gives some conflicting evidence 
regarding the validity of the UGC and its close variant — the small set expansion hypothesis 
(SSEH) of [RSIO]. (See also our conclusions section.) 

First, we show in Theorem 2.5 that approximating the 2^4 problem to within any con- 
stant factor cannot be done in polynomial time (unless SAT can be solved in exp(o(?i)) time) 
but yet this problem is seemingly related to the Unique Games and Small-Set Expansion 
problems. In particular, we show that approximating the 2 — > 4 norm is Small-Set Ex- 
pansion- hard but yet has a subexponential algorithm which closely related to the [ABSIO] 
algorithm for Unique Games and Small-Set Expansion. Thus the computational difficulty 
of this problem can be considered as some indirect evidence supporting the vaUdity of the 
UGC (or perhaps some weaker variants of it). To our knowledge, this is the first evidence 
of this kind for the UGC. 

On the other hand, we show that a natural polynomial-time algorithm (based on an 
SDP hierarchy) that solves the previously proposed hard instances for Unique Games. 

' We follow the convention to use expectation norms fot functions (on probability spaces) and counting 
norms, denoted as = (2;Li for vectors v e R'". All normed spaces here will be finite dimensional. 

We distinguish between expectation and counting norms to avoid recurrent normalization factors. 

^We use this name because a bound of the form ||A||p^f, ^ 1 for p < q is often called a hypercontractive 
inequality. 
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The previous best algorithms for some of these instances took almost exponential ( 
exp(exp(log"^'^ «)) ) time, and in fact they were shown to require super-polynomial time 
for some hierarchies. Thus this result suggests that this algorithm could potentially refute 
the UGC, and hence can be construed as evidence opposing the UGC's validity. 

2.1 Algorithms 

We show several algorithmic results for the 2^4 (and more generally 2 ^ q) norm. 

2.1.1 Subexponential algorithm for "good" approximation 

For ^ ^ 2, we say that an algorithm provides a (c, C)-approximation for the 2 ^ q norm if 
on input an operator A, the algorithm can distinguish between the case that ||A||2^^ < ccr 
and the case that ||A||2^^ > Ccr, where cr = crmin(A) is the minimum nonzero singular value 
of A. (Note that since we use the expectation norm, ||A/||^ > IIA/lb > cril/lb for every 
function / orthogonal to the Kernel of A.) We say that an algorithm provides a good ap- 
proximation for the 2 ^ q norm if it provides a (c, C)-approximation for some (dimension 
independent) constants c < C. The motivation behind this definition is to capture the notion 
of a dimension independent approximation factor, and is also motivated by Theorem 2.4 
below, that relates a good approximation for the 2 ^ q norm to solving the Small-Set 
Expansion problem. 

We show the following: 

Theorem 2.1. For every \ < c < C, there is a poly(?i) exp(«^^^)-f/me algorithm that com- 
putes a (c, C)-approximation for the 2 ^ q norm of any linear operator whose range is 
R". 

Combining this with our results below, we get as a corollary a subexponential algorithm 
for the Small-Set Expansion problem matching the parameters of [ABS 10]'s algorithm. We 
note that this algorithm can be achieved by the "Sum of Squares" SDP hierarchy described 
below (and probably weaker hierarchies as well, although we did not verify this). 

2.1.2 Polynomial algorithm for specific instances 

We study a natural semidefinite programming (SDP) relaxation for computing the 2^4 
norm of a given linear operator which we call Tensor-SDP.^ While Tensor-SDP is very 
unlikely to provide a poly-time constant-factor approximation for the 2^4 norm in general 
(see Theorem 2.5 below), we do show that it provides such approximation on two very 
different types of instances: 

- We show that Tensor-SDP certifies a constant upper bound on the ratio 
ll^ll2->4/l|A||2^2 where A : R" ^ R"' is a random linear operator (e.g., obtained 
by a matrix with entries chosen as i.i.d Bernoulli variables) and m ^ D.(n^ log n). In 
contrast, if m = o(n^) then this ratio is a»(l), and hence this result is almost tight in 
the sense of obtaining "good approximation" in the sense mentioned above. We find 
this interesting, since random matrices seem like natural instances; indeed for super- 
ficially similar problems such shortest codeword, shortest lattice vector (or even the 
1 — > 2 norm), it seems hard to efficiently certify bounds on random operators. 

''We use the name Tensor-SDP for this program since it will be a canonical relaxation of the polynomial 
program max||j-|i2=i (T, x'^) where T is the 4-tensor such that (T, x"^) = ||Ax||^. See Section 4.5 for more details. 
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- We show that Tensor-SDP gives a good approximation of the 2^4 norm of the 
operator projecting a function / : |±1)" ^ R into its low-degree component: 

Theorem 2.2. Let Vd he the liner operator that maps a function / : {± 1 }" ^ R of the 

form f ^Y. aQ\n\ faXa its low-degree part f — Y^lalsid faXa (where Xaix) ^ Uiea Xi). 

Then Tensor-SDP(!Prf) < 9^. 

The fact that has bounded 2^4 norm is widely used in the literature relating to 
the UGC. Previously, no general-purpose algorithm was known to efficiently certify 
this fact. 

2.1.3 Quasipolynomial algorithm for additive approximation 

We also consider the generalization of Tensor-SDP to a natural SDP hierarchy. This is 
a convex relaxation that starts from an initial SDP and tightens it by adding additional 
constraints. Such hierarchies are generally paramaterized by a number r (often called the 
number of rounds), where the round corresponds to the initial SDP, and the n^^ round (for 
discrete problems where n is the instance size) corresponds to the exponential brute force 
algorithm that outputs an optimal answer. Generally, the r"' -round of each such hierarchy 
can be evaluated in n'^'-''^ time (though in some cases n^(^^2'^^''^ time suffices [BRSll]). See 
Section 3, as well as the surveys [CTIO, Lau03] and the papers [SA90, LS91, RS09, KPS 10] 
for more information about these hierarchies. 

We call the hierarchy we consider here the Sum of Squares (SoS) hierarchy. It is 
not novel but rather a variant of the hierarchies studied by several authors including 
Shor [Sho87], Parrilo [ParOO, Par03], Nesterov [NesOO] and Lasserre [LasOl]. (Generally 
in our context these hierarchies can be made equivalent in power, though there are some 
subtleties involved; see [Lau09] and Appendix C for more details.) We describe the SoS 
hierarchy formally in Section 3. We show that Tensor-SDP's extension to several rounds 
of the SoS hierarchy gives a non-trivial additive approximation: 

Theorem 2.3. Let Tensor-SDP^''^ denote the n'-"^'^^ -time algorithm by extending Tensor- 
SDP (i rounds of the Sum-of-Squares hierarchy. Then for all s, there is d = 0(log(«)/e^) 
such that 

< Tensor-SDP(^)(A) < ||A||^^4 + s\\A\^^^^\\Af^^^ . 

The term ||A||2^2ll'^ll2^oo ^ natural upper bound on IIAHj^^ obtained using Holder's 
inequality. Since ||A||2^2 is the largest singular value of A, and ||A||2~>oo is the largest 2-norm 
of any row of A, they can be computed quickly. Theorem 2.3 shows that one can improve 
this upper bound by a factor of s using run time exp(log^(«)/e^)). Note however that in the 
special case (relevant to the UGC) that A is a projector to a subspace V, ||A||2^2 = 1 and 
l|A||2^co > Vdim(V) (see Lemma 10.1), which unfortunately means that Theorem 2.3 does 
not give any new algorithms in that setting. 

Despite Theorem 2.3 being a non-quantum algorithm for for an ostensibly non- 
quantum problem, we actually achieve it using the results of Brandao, Christiandl and 
Yard [BaCYll] about the quantum separability problem. In fact, it turns out that the SoS hi- 
erarchy extension of Tensor-SDP is equivalent to techniques that have been used to approx- 
imate separable states [DPS04]. We find this interesting both because there are few positive 
general results about the convergence rate of SDP hierarchies, and because the techniques 
of [BaCYll], based on entanglement measures of quantum states, are different from typical 
ways of proving correctness of semidefinite programs, and in particular different techniques 
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from the ones we use to analyze Tensor-SDP in other settings. This connection also means 
that integrality gaps for Tensor-SDP would imply new types of separable states that pass 
most of the known tests for entanglement. 

2.2 Reductions 

We relate the question of computing the hypercontractive norm with two other problems 
considered in the literature: the small set expansion problem [RSIO, RSTlOa], and the 
injective tensor norm question studied in the context of quantum information theory [HMIO, 
BaCYll]. 

2.2.1 Hypercontractivity and small set expansion 

Khot's Unique Games Conjecture [Kho02] (UGC) has been the focus of intense research 
effort in the last few years. The conjecture posits the hardness of approximation for a 
certain constraint-satisfaction problem, and shows promise to settle many open questions 
in the theory of approximation algorithms. Many works have been devoted to studying the 
plausibility of the UGC, as well as exploring its implications and obtaining unconditional 
results inspired or motivated by this effort. Tantalizingly, at the moment we have very 
little insight on whether this conjecture is actually true, and thus producing evidence on 
the UGC's truth or falsity is a central effort in computational complexity. Raghavendra 
and Steurer [RSIO] proposed a hypothesis closely related to the UGC called the Small- 
Set Expansion hypothesis (SSEH). Loosely speaking, the SSEH states that it is NP-hard 
to certify that a given graph G = (V, £) is a small-set expander in the sense that subsets 
with size o{\V\) vertices have almost all their neighbors outside the set. [RSIO] showed that 
SSEH implies UGC. While a reduction in the other direction is not known, all currently 
known algorithmic and integrality gap results apply to both problems equally well (e.g., 
[ABSIO, RSTlOb]), and thus the two conjectures are likely to be equivalent. 

We show, loosely speaking, that a graph is a small-set expander if and only if the pro- 
jection operator to the span of its top eigenvectors has bounded 2^4 norm. To make this 
precise, if G = (V, E) is a regular graph, then let P>^(G) be the projection operator into the 
span of the eigenvectors of G's normalized adjacency matrix with eigenvalue at least A, and 
OgW be minscK|5K5|V| P{i<,v)eE[v ^S\ueS]. 

Then we relate small-set expansion to the 2^4 norm (indeed the 2 ^ q norm for even 
^ > 4) as follows: 

Theorem 2.4. For every regular graph G, A> and even q, 

1. (Norm bound imphes expansion) For all 5 > 0,s > 0, \\P^A{G)\\2^q < eld^'^''^^''^'^ 
implies that Og(5) > \ — A — e^. 

2. (Expansion implies norm bound) There is a constant c such that for all 5 > 0, 3>g('^) > 
1 - Al-'^i implies ||P>i(G)||2-.^ < 2/ V^. 

While one direction (bounded hypercontractive norm implies small-set expansion) was 
already known,^ to our knowledge the other direction is novel. As a corollary we show that 
the SSEH implies that there is no good approximation for the 2 — > 4 norm. 

''While we do not know who was the first to point out this fact explicitly, within theoretical CS it was 
implicitly used in several results relating the Bonami-Beckner-Gross hypercontractivity of the Boolean noise 
operator to isoperimetric properties, with one example being O'Donnell's proof of the soundness of [KV05]'s 
integrality gap (see [KV05, Sec 9.1]). 
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2.2.2 Hypercontractivity and the injective tensor norm 

We are able to make progress in understanding both the complexity of the 2 — > 4 norm and 
the quality of our SDP relaxation by relating the 2^4 norm to several natural questions 
about tensors. An r-tensor can be thought of as an r-linear form on R", and the injective 
tensor norm \\ ■ ||inj of a tensor is given by maximizing this form over all unit vector inputs. 
See Section 9 for a precise definition. When r = I, this norm is the 2-norm of a vector and 
when r = 2, it is the operator norm (or 2 2-norm) of a matrix, but for r = 3 it becomes 
NP-hard to calculate. One motivation to study this norm comes from quantum mechanics, 
where computing it is equivalent to a number of long-studied problems concerning entan- 
glement and many-body physics [HMIO]. More generally, tensors arise in a vast range of 
practical problems involving multidimensional data [vL()9] for which the injective tensor 
norm is both of direct interest and can be used as a subroutine for other tasks, such as tensor 
decomposition [dlVKKV05]. 

It is not hard to show that ||A||2^4 is actually equal to ||r||inj for some 4-tensor T = Ta- 
Not all 4-tensors can arise this way, but we show that the injective tensor norm problem for 
general tensors can be reduced to those of the form Ta- Combined with known results about 
the hardness of tensor computations, this reduction implies the following hardness result. To 
formulate the theorem, recall that the Exponential Time Hypothesis (ETH) [IPZ98] states 
that 3-SAT instances of length n require time exp(n(«)) to solve. 

Theorem 2.5 (informal version). Assuming ETH, then for any e, 6 satisfying 2e + 5 < 1, 
the 2^4 norm of an mx m matrix A cannot be approximated to within a exp(log'^(m)) 
multiplicative factor in time less than time. This hardness result holds even with A 

is a projector 

While we are primarily concerned with the case of Q(l) approximation factor, we note 
that poly-time approximations to within multiplicative factor 1 -i- are not possible 

unless P = NP. This, along with Theorem 2.5, is restated more formally as Theorem 9.4 
in Section 9.2 . We also whose there that Theorem 2.5 yields as a corollary that, assuming 
ETH, there is no polynomial-time algorithm obtaining a good approximation for the 2 
4 norm. We note that these results hold under weaker assumptions than the ETH; see 
Section 9.2 as well. 

Previously no hardness results were known for the 2 — > 4 norm, or any p ^ q norm with 
p < q, even for calculating the norms exactly. However, hardness of approximation results 
for 1 -I- 1 / poly(«) multiplicative error have been proved for other polynomial optimization 
problems [BTN98]. 

2.3 Relation to the Unique Games Conjecture 

Our results and techniques have some relevance to the unique games conjecture. Theo- 
rem 2.4 shows that obtaining a good approximation for the 2 — > g norm is Small-Set 
Expansion hard, but Theorem 2.1 shows that this problem is not "that much harder" than 
Unique Games and Small-Set Expansion since it too has a subexponential algorithm. Thus, 
the 2 ^ q problem is in some informal sense "of similar flavor" to the Unique Games/ 
Small-Set Expansion. On the other hand, we actually are able to show in Theorem 2.5 
hardness (even if only quasipolynomial) to this problem, whereas a similar result for Unique 
Games or Small-Set Expansion would be a major breakthrough. So there is a sense in which 
these results can be thought of as some "positive evidence" in favor of at least weak variants 
of the UGC. (We emphasize however that there are inherent difficulties in extending these 
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results for Unique Games, and it may very well be that obtaining a multiplicative approxi- 
mation to the 2 ^ 4 of an operator is significantly harder problem than Unique Games or 
Small-Set Expansion.) In contrast, our positive algorithmic results show that perhaps the 

2 ^ q norm can be thought of as a path to refuting the UGC. In particular we are able to 
extend our techniques to show a polynomial time algorithm can approximate the canonical 
hard instances for Unique Games considered in prior works. 

Theorem 2.6. (Informal) Eight rounds of the SoS relaxation certifies that it is possible to 
satisfy at most 1 / 100 fraction of the constraints of Unique Games instances of the "quotient 
noisy cube" and "short code" types considered in [RS09, KS09, KPSIO, BGH'^'ll] 

These instances are the same ones for which previous works showed that weaker hier- 
archies such as "SDP-i-Sherali Adams" and "Approximate Lasserre" require a>(l) rounds to 
certify that one cannot satisfy almost all the constraints [KV05, RSIO, KS09, BGH+11]. In 
fact, for the "short code" based instances of [BGH+11] there was no upper bound known 
better than exp(log"^'^ n) on the number of rounds required to certify that they are not al- 
most satisfiable, regardless of the power of the hierarchy used. 

This is significant since the current best known algorithms for Unique Games utilize 
SDP hierarchies [BRSll, GSll],^ and the instances above were the only known evidence 
that polynomial time versions of these algorithms do not refute the unique games con- 
jecture. Our work also show that strong "basis independent" hierarchies such as Sum of 
Squares [ParOO, Par03] and Lasserre [LasOl] can in fact do better than the seemingly only 
slightly weaker variants.^ 

3 The SoS hierarchy 

For our algorithmic results in this paper we consider a semidefinite programming (SDP) 
hierarchy that we call the Sum of Squares (SoS) hierarchy. We call the hierarchy we consider 
here the Sum of Squares (SoS) hierarchy. This is not a novel algorithm and essentially 
the same hierarchies were considered by many other researchers (see the survey [Lau09]). 
Because different works sometimes used slightly different definitions, in this section we 
formally define the hierarchy we use as well as explain the intuition behind it. While there 
are some subtleties involved, one can think of this hierarchy as equivalent in power to the 
programs considered by Parrilo, Lasserre and others, while stronger than hierarchies such 
"SDP-i-SheraU-Adams" and "Approximate Lasserre" considered in [RS09, KPSIO, BRSl 1]. 

The SoS SDP is a relaxation for polynomial equations. That is, we consider a system 
of the following form: maximize Po{x) over x e R" subject to P^ix) = for / = I . . .m and 
Pq, . . ., P,n polynomials of degree at most dJ For r ^ 2d, the r-round SoS SDP optimizes 
over xi,. . . ,Xn that can be thought of as formal variables rather than actual numbers. For 
these formal variables, expressions of the form P{x) are well defined and correspond to a 

^Both these works showed SDP-hierarchy-based algorithms matching the performance of the subexponen- 
tial algorithm of [ABS 10]. [GSl 1] used the Lasserre hierarchy, while [BRSl 1] used the weaker "SDP+Sherali- 
Adams" hierarchy. 

^The only other result of this kind we are aware of is [KMNU], that show that LasseiTe gives a better 
approximation ratio than the linear programming Sherali-Adams hierarchy for the knapsack problem. We do 
not know if weaker semidefinite hierarchies match this ratio, although knapsack of course has a simple dynamic 
programming based PTAS. 

'This form is without loss of generality, as one can translate an inequality constraint of the form Pj(x) > 
into the equality constraint (-P,(x) - y^)^ = where y is some new auxiliary variable. It is useful to show 
equivalences between various hierarchy formulations; see also Appendix C. 
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real number (which can be computed from the SDP solution) as long as P is a polynomial of 
degree at most r. These numbers obey the linearity property which is that iP+Q){x) = P{x)+ 
Qix), and, most importantly, the positivity property that P^{x) > for every polynomial P 
of degree at most r/2. These expressions satisfy all initial constraints (i.e., P}{x) = for 
/ = 1 . . . m) and the value of the SDP is set to be the expression Pq(x). The above means 
that to show that the SoS relaxation has value at most v it is sufficient to give any proof that 
derives from the constraints {P?(jc) = 0),=i..m the conclusion that Po{x) < v using only the 
linearity and positivity properties, without using any polynomials of degree larger than r in 
the intermediate steps. In fact, such a proof always has the form 

k m 

V - Poix) - 2 R^xf + Piix)Qi{x), (3.1) 

(=1 i=l 

where Ri, . . ., Rk, Q\, ■ ■ ■ , Qm are arbitrary polynomials satisfying deg /?,• < r/2, deg PjQi < 
r. The polynomial 2i, /?;(-^)^ is a SoS (sum of squares) and optimizing over such polynomi- 
als (along with the Qi,. . ., Qm) can be achieved with a semi-definite program. 

Pseudo-expectation view. For more intuition about the SoS hierarchy, one can imagine 
that instead of being formal variables, jci, . . . , x„ actually correspond to correlated random 
variables X\,...,Xn over R" , and the expression P{x) is set to equal the expectation E[P(X)] . 
In this case, the linearity and positivity properties are obviously satisfied by these expres- 
sions, although other properties that would be obtained if xi, . . . , x„ were simply numbers 
might not hold. For example, the property that R{x) = P{x)Q{x) if R = P ■ Q does not 
necessarily hold, since its not always the case that E[XY] = E[X]E[Y] for every three ran- 
dom variables X, Y, Z. So, another way to describe the r-round SoS hierarchy is that the 
expressions P{x) (for P of degree at most r) satisfy some of the constraints that would have 
been satisfied if these expressions corresponded to expectations over some correlated ran- 
dom variables Xi, . . ., X^. For this reason, we will use the notation Ej^ P{x) instead of P{x) 
where we refer to the functional E as a level-r pseudo-expectation functional (or r-p.e.f. for 
short). Also, rather than describing xi, . . . , x„ as formal variables, we will refer to them as 
level-r fictitious random variables (or r-f.r.v. for short) since in some sense they look like 
true correlated random variables up to their moment. 

We can now present our formal definition of pseudo-expectation and the SoS hierarchy:^ 

Definition 3.1. Let E be a functional that maps polynomial P over R" of degree at most r 
into a real number which we denote by E^ P(x) or E P for short. We say that E is a level-r 
pseudo-expectation fiinctional (r-p.e.f. for short) if it satisfies: 

Linearity For every polynomials P, Q of degree at most r and a,/i € R, E(QfP -i- PQ) = 
aBP-^/3BQ. 

Positivity For every polynomial P of degree at most r/2, EP^ ^ 0. 

Normalization E 1 = 1 where on the RHS, 1 denotes the degree-0 polynomial that is the 
constant 1. 

Definition 3.2. Let Pq,... ,Pm be polynomials over R" of degree at most d, and let r > 
2d. The value of the r-round SoS SDP for the program "max Pq subject to P^ = for 

^We use the name "Sum of Squares" since the positivity condition below is the most important constraint 
of this program. However, some prior works used this name for the dual of the program we define here. As we 
show in Appendix C, in many cases of interest to us there is no duality gap. 
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i = 1 . . . m", is equal to the maximum of EPq where E ranges over all level r pseudo- 
expectation functionals satisfying E = for / = I . . .m. 

The functional E can be represented by a table of size containing the pseudo- 
expectations of every monomial of degree at most r (or some other linear basis for polyno- 
mials of degree at most r). For a linear functional E, the map P i-^ EP^ is a quadratic form. 
Hence, E satisfies the positivity condition if and only if the corresponding quadratic form is 
positive semidefinite. It follows that the convex set of level-r pseudo-expectation function- 
als over R" admits an n^^'^^-time separation oracle, and hence the r-round SoS relaxation 
can be solved up to accuracy e in time {mn ■ \og{l / e))'^^''\ 

As noted above, for every random variable X over R", the functional EP EP(X) is 
a level-r pseudo-expectation functional for every r. As r — > oo, this hierarchy of pseudo- 
expectations will converge to the expectations of a true random variable [LasOl], but the 
convergence is in general not guaranteed to happen in a finite number of steps [dKLl 1]. 

Whenever there can be ambiguity about what are the variables of the polynomial P 
inside an r-p.e.f. E, we will use the notation ^xP(x) (e.g., Ev X^ is the same as EP where 
P is the polynomial x i-> x^). As mentioned above, we call the inputs x to the polynomial 
level-r fictitious random variables or r-f.r.v. for short. 

Remark 3.3. The main difference between the SoS hierarchy and weaker SDP hierarchies 
considered in the literature such as SDP-i-Sherali Adams and the Approximate Lasserre 
hierarchies [RS09, KPSIO] is that the SoS hierarchy treats all polynomials equally and 
hence is agnostic to the choice of basis. For example, the approximate Lasserre hierarchy 
can also be described in terms of pseudo-expectations, but these pseudo-expectations are 
only defined for monomials, and are allowed some small error. While they can be extended 
linearly to other polynomials, for non-sparse polynomials that error can greatly accumulate. 

3.1 Basic properties of pseudo-expectation 

For two polynomials P and Q, we write P<QifQ = P + YljLi for some polynomials 
Pi, . . .,Rm- 

If P and Q have degree at most r, then P < Q implies that EP < E 2 every r-p.e.f. E. 
This follows using linearity and positivity, as well as the (not too hard to verify) observation 
that if 2 - P = R] then it must hold that deg(P,) < max{deg(P), deg(2))/2 for every /. 

We would like to understand how polynomials behave on linear subspaces of R". A 
map P : R" ^ R is polynomial over a linear subspace V Q R" if P restricted to V agrees 
with a polynomial in the coefficients for some basis of V. Concretely, if g'l, . . . , is an 
(orthonormal) basis of V, then P is polynomial over V if P(/) agrees with a polynomial in 
(/> di), - ■ if, 9m)- We say that P < Q holds over a subspace V if P - 2, as a polynomial 
over V, is a sum of squares. 

Lemma 3.4. Let P and Q be two polynomials over R" of degree at most r, and let B.W^ 
R*^ be a linear operator. Suppose that P < Q holds over the kernel of B. Then, EP < E Q 
holds for any r-p.e.f. E over R" that satisfies Ej||B/||^ = 0. 

Proof Since P < Q over the kernel of B, we can write Q(J) = P{f) + YIJLi ^Jif) + 
Y!^j=\{Bf)jS j{f) for polynomials R\,...,Rm and ^i,...,^^ over R". By positivity, 
^fRfif) > for all / € \m\ We claim that ^f(Bf)jS j{f) = for all j e [k] (which 
would finish the proof). This claim follows from the fact that ^f{Bf)^. = for all j e [k] 
and Lemma 3.5 below. □ 
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Lemma 3.5 (Pseudo Cauchy-Schwarz). Let P and Q be two polynomials of degree at most 
r. Then, EPQ < VEP2 . for any degree-2r pseudo -expectation functional E. 

Proof. We first consider the case Ef ^, E 2^ > 0. Then, by linearity of E, we may assume 
that EP2 = ^Q^ = \. Since 2PQ <P^ + Q^ (by expanding the square {P-Qf), it follows 
that Ef 2 < ^ EP^ + ^ E 2^ = 1 as desired. It remains to consider the case Ef ^ = 0. In 
this case, laPQ < P^ + a^Q^ implies that EPg ^ a ■ ^BQ^ for all a > 0. Thus 6^2 = 0, 
as desired. □ 

Lemma 3.5 also explains why our SDP in Definition 3.2 is dual to the one in (3.1). If 
E is a level-r pseudo-expectation functional satisfying E[P^] = 0, then Lemma 3.5 implies 
that ^[PiQi] = for all Qi with deg PiQi < r. 

Appendix A contains some additional useful facts about pseudo-expectation func- 
tional. In particular, we will make repeated use of the fact that they satisfy another 
Cauchy-Schwarz analogue: namely, for any level-2 f.r.v.'s f,g, we have ^fg{f,g) < 

-^Eyll/lp ^Egllglp. This is proven in Lemma A.4. 
3.2 Why is this SoS hierarchy useful? 

Consider the following example. It is known that if / : {±1}^ ^ R is a degree-t/ polynomial 
then 

E f{wf \ > E f{wf, (3.2) 

we\±\\' I we{±l}" 

(see e.g. [O'DOV]). Equivalently, the hnear operator Pd on R'*''^ that projects a function 
into the degree d polynomials satisfies \\Pd\\2->4 ^ 9'^^^. This fact is known as the hypercon- 
tractivity of low-degree polynomials, and was used in several integrality gaps results such 
as [KV05]. By following the proof of (3.2) we show in Lemma 5. 1 that a stronger statement 
is true: 

9''{ B f(wf\ ^ ^ fiwf + yQiiff, (3.3) 

where the 2,'s are polynomials of degree < 2 in the variables {fio:)}^^(^ie]^ specifying 
the coefficients of the polynomial /. By using the positivity constraints, (3.3) implies that 
(3.2) holds even in the 4-round SoS relaxation where we consider the coefficients of / to 
be given by 4-f.r.v. This proves Theorem 2.2, showing that the SoS relaxation certifies that 

\\Pd\\2-.4 < 9'l\ 

Remark 3.6. Unfortunately to describe the result above, we needed to use the term "degree" 
in two different contexts. The SDP relaxation considers polynomial expressions of degree 
at most 4 in the coefficients of f. This is a different notion of degree than the degree 
d of f itself as a polynomial over R^. In particular the variables of this SoS program 
are the coefficients {/(Q')}Q,g([ny Note that for every fixed w, the expression f{w) is 

a linear polynomial over these variables, and hence the expressions {^we[±[]': f(^)^) 
Eujgj+ijf fiw)^ are degree 4 polynomials over the variables. 

While the proof of (3.3) is fairly simple, we find the result — that hypercontractivity 
of polynomials is efficiently certifiable — somewhat surprising. The reason is that hyper- 
contractivity serves as the basis of the integrality gaps results which are exactly instances 
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of maximization problems where the objective value is low but this is supposedly hard 
to certify. In particular, we consider integrality gaps for Unique Games considered be- 
fore in the literature. All of these instances follow the framework initiated by Khot and 
Vishnoi [KV05]. Their idea was inspired by Unique Games hardness proofs, with the inte- 
grality gap obtained by composing an initial instance with a gadget. The proof that these 
instances have "cheating" SDP solutions is obtained by "lifting" the completeness proof 
of the gadget. On the other hand, the soundness property of the gadget, combined with 
some isoperimetric results, showed that the instances do not have real solutions. This ap- 
proach of lifting completeness proofs of reductions was used to get other integrality gap 
results as well [Tul09]. We show that the SoS hierarchy allows us to lift a certain soundness 
proof for these instances, which includes a (variant of) the invariance principle of [MOO05], 
influence-decoding a la [KKMO04], and hypercontractivity of low-degree polynomials. It 
turns out all these results can be proven via sum-of-squares type arguments and hence lifted 
to the SoS hierarchy. 

4 Overview of proofs 

We now give a very high level overview of the tools we use to obtain our results, leaving 
details to the later sections and appendices. 

4.1 Subexponential algorithm for the 2-to-q norm 

Our subexponential algorithm for obtaining a good approximation for the 2 ^ q norm is 
extremely simple. It is based on the observation that a subspace V c R" of too large a dimen- 
sion must contain a function / such that \\f\\q » ll/lb- For example, if dim(V) » ^/n, then 
there must be / such that H/IU » ll/lb- This means that if we want to distinguish between, 
say, the case that ||y||2^4 < 2 and ||y||2^4 ^ 3, then we can assume without loss of general- 
ity that dim(y) = 0( yjn) in which case we can solve the problem in exp(0( ^Jn)) time. To 
get intuition, consider the case that V is spanned by an orthonormal basis /',..., /'^ of func- 
tions whose entries are all in ±1. Then clearly we can find coefficients a\,...,ad € {+1} 
such that the first coordinate of g - Yj ^^jf^ is equal to d, which means that its 4-norm is at 
least {d'^lnyi'^ = d/n^'"^. On the other hand, since the basis is orthonormal, the 2-norm of g 
equals Vd which is ^ djn^^'^ for d » ^fn. 

Note the similarity between this algorithm and [ABSlOJ's algorithm for Small-Set Ex- 
pansion, that also worked by showing that if the dimension of the top eigenspace of a graph 
is too large then it cannot be a small-set expander. Indeed, using our reduction of Small-Set 
Expansion to the 2 — > g norm, we can reproduce a similar result to [ABSIO]. 

4.2 Bounding the value of SoS relaxations 

We show that in several cases, the SoS SDP hierarchy gives strong bounds on various in- 
stances. At the heart of these results is a general approach of "lifting" proofs about one- 
dimensional objects into the SoS relaxation domain. Thus we transform the prior proofs 
that these instances have small objective value, into a proof that the SoS relaxation also 
has a small objective The crucial observation is that many proofs boil down to the simple 
fact that a sum of squares of numbers is always non-negative. It turns out that this "sum of 
squares" axiom is surprisingly powerful (e.g. implying a version of the Cauchy-Schwarz 
inequality given by Lemma A.4), and many proofs boil down to essentially this principle. 
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4.3 The 2-to-4 norm and small-set expansion 

Bounds on the p ^ q norm of operators for < ^ have been used to show fast convergence 
of Markov chains. In particular, it is known that if the projector to the top eigenspace of 
a graph G has bounded 2^4 norm, then that graph is a small-set expander in the sense 
that sets of o(l) fraction of the vertices have most of their edges exit the set. In this work 
we show a converse to this statement, proving that if G is a small-set expander, then the 
corresponding projector has bounded 2^4 norm. As mentioned above, one corollary of 
this result is that a good (i.e., dimension-independent) approximation to the 2^4 norm 
will refute the Small-Set Expansion hypothesis of [RSIO]. 

We give a rough sketch of the proof. Suppose that G is a sufficiently strong small-set 
expander, in the sense that every set S with \S\ < 5|y(G)| has all but a tiny fraction of 
the edges {u, v) with u e S satisfying v ^ S . Let / be a function in the eigenspace of G 
corresponding to eigenvalues larger than, say 0.99. Since / is in the top eigenspace, for the 
purposes of this sketch let's imagine that it satisfies 

Vx e V, E /(?/) ^ 0.9/(x), (4.1) 

IJ~.X 

where the expectation is over a random neighbor y of x. Now, suppose that E /(x)^ = 1 
but E/(x)'* = C for some C » poly(l/(5). That means that most of the contribution to 
the 4-norm comes from the set S of vertices x such that /(x) ^ (1/2)C'^^, but 151 «; 
S\V{G)\. Moreover, suppose for simplicity that /(x) € ((l/2)Ci^'^,2C''"^), in which case 
the condition (*) together with the small-set expansion condition that for most vertices y in 
r(5) (the neighborhood of S) satisfy f{y) ^ but the small-set expansion condition, 

together with the regularity of the graph imply that |r(S)| > 200151 (say), which implies 
that E /(x)"^ ^ 2C — a contradiction. 

The actual proof is more complicated, since we can't assume the condition (4.1). In- 
stead we will approximate it it by assuming that / is the function in the top eigenspace that 
maximizes the ratio H/IU/H/lb- See Section 8 for the details. 

4.4 The 2-to-4 norm and the injective tensor norm 

To relate the 2 — > 4 norm to the injective tensor norm, we start by establishing equivalences 
between the 2^4 norm and a variety of different tensor problems. Some of these are 
straightforward exercises in linear algebra, analogous to proving that the largest eigenvalue 
of M^M equals the square of the operator norm of M. 

One technically challenging reduction is between the problem of optimizing a general 
degree-4 polynomial /(x) for x € R" and a polynomial that can be written as the sum 
of fourth powers of linear functions of x. Straightforward approaches will magnify errors 
by poly(?i) factors, which would make it impossible to rule out a PTAS for the 2^4 
norm. This would still be enough to prove that a 1 / poly(?i) additive approximation is NP- 
hard. However, to handle constant-factor approximations, we will instead use a variant of 
a reduction in [HMIO]. This will allow us to map a general tensor optimization problem 
(corresponding to a general degree-4 polynomial) to a 2 -^4 norm calculation without 
losing very much precision. 

To understand this reduction, we first introduce the x matrix A2,2 (defined in 
Section 9) with the property that ||A||2^4 = maxz^A2,2Z. where the maximum is taken 
over unit vectors z that can be written in the form x (gi i/. Without this last restriction, the 
maximum would simply be the operator norm of A2,2- Operationally, we can think of A2,2 
as a quantum measurement operator, and vectors of the form x ® y as unentangled states 
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(equivalently we say that vectors in this form are tensor product states, or simply "product 
states"). Thus the difference between ||A||2^4 and ||A2,2ll2^2 can be thought of as the extent 
to which the measurement can notice the difference between product states and (some) 
entangled state. 

Next, we define a matrix A! whose rows are of the form (x' ® y')* yjA2,2, where x',y' € 
R" range over a distribution that approximates the uniform distribution. If A' acts on a 
vector of the form x ® y, then the maximum output 4-norm (over L2-unit vectors x, y) is 
precisely ||A||2-^4. Intuitively, if A' acts on a highly "entangled" vector z, meaning that 
{z,x ® y) is small for all unit vectors x,y, then HA'zlU should be small. This is because z 
will have small overlap with x' ® y' , and A2,2 is positive semi-definite, so its off-diagonal 
entries can be upper-bounded in terms of its operator norm. These arguments (detailed 
in Section 9.2) lead to only modest bounds on A', but then we can use an amplification 
argument to make the 2 — > 4 norm of A' depend more sensitively on that of A, at the cost 
of blowing up the dimension by a polynomial factor. 

The reductions we achieve also permit us, in Section 9.3, to relate our Tensor-SDP algo- 
rithm with the sum-of-squares relaxation used by Doherty, Parrilo, and Spedalieri [DPS04] 
(henceforth DPS). We show the two relaxations are essentially equivalent, allowing us to 
import results proved, in some cases, with techniques from quantum information theory. 
One such result, from [BaCYl 1], requires relating A2,2 to a quantum measurement of the 1- 
LOCC form. This means that there are two n-dimensional subsystems, combined via tensor 
products, and A2,2 can be implemented as a measurement on the first subsystem followed by 
a measurement on the second subsystem that is chosen conditioned on the results of the first 
measurement. The main result of [BaCYl 1] proved that such LOCC measurements exhibit 
much better behavior under DPS, and they obtain nontrivial approximation guarantees with 
only C?(log(?i)/e^) rounds. Since this is achieved by DPS, it also implies an upper bound 
on the error of Tensor-SDP. This upper bound is eZ, where Z is the smallest number for 
which A2,2 ^ for some 1-LOCC measurement M. While Z is not believed to be effi- 
ciently computable, it is at least ||A2,2ll2^2> since any measurement M has ||M||2-^2 ^ 1- To 
upper bound Z, we can explicitly construct A2,2 as a quantum measurement. This is done by 
the following protocol. Let a\, . . . ,amhe. the rows of A. One party performs the quantum 
measurement with outcomes {aaiaj}'!'^^ (where o- is a normalization factor) and transmits 
the outcome / to the other party. Upon receiving message /, the second party does the two 
outcome measurement {paiaj , I -fSataJ] and outputs or 1 accordingly, where p is another 
normalization factor. The measurement A2,2 corresponds to the "0" outcomes. For this to 
be a physically realizable 1-LOCC measurement, we need a < ||A^A||2^2 and jS < l|A||2^^. 
Combining these ingredients, we obtain the approximation guarantee in Theorem 2.3. More 
details on this argument are in Section 9.3.1. 

4.5 Definitions and Notation 

Let tl be some finite set. For concreteness, and without loss of generality, we can let tl be 
the set { 1, . . . , n), where n is some positive integer. We write E.^/ / to denote the average 
value of a function / : 'ZY — > R over a random point in tl (omitting the subscript 1/ when 
it is clear from the context). We let L2('W) denote the space of functions /: — > R 
endowed with the inner product {f,g) = fg and its induced norm ||/|| - (/, /)^^^. 
For p ^ I, the p-norm of a function / € L2{W is defined as ||/||p {B\f\P)^'P. A 
convexity argument shows ||/||p < \\f\\q for p q. If A is a linear operator mapping 
functions from L2{'U) to L2{'V), and p,q > 1, then the p-to-q norm of A is defined as 
||A||p^^ = maxo^/gL2('W)IIA/||^;/||/||p. If V £ L2{'tl) is a linear subspace, then we denote 
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\W\\p^, - max/ey||/ll,/ll/llp. 

Counting norms. In most of this paper we use expectation norms defined as above, but in 
some contexts the counting norms will be more convenient. We will stick to the convention 
that functions use expectation norms while vectors use the counting norms. For a vector 
V € C*^ and p I, the p counting norm of v, denoted ||f;||p, is defined to be (Eiieti It^/I^)'^''- 
The counting inner product of two vectors u,v e R'^, denoted as {u, v), is defined to be 

5 The Tensor-SDP algorithm 

There is a very natural SoS program for the 2^4 norm for a given linear operator 
A: L2{W^L2m. 

Algorithm Tensor-SDP^''\A): 
Maximize ]E/||A/||^ subject to 

- / is a d-f.r.y. over L2('Z/), 

- Ejdl/lP - 1)^ = 0. 

Note that ||A/||^ is indeed a degree 4 polynomial in the variables {f{u)]ue^i- The 
Tensor-SDP*^'^'^ algorithm makes sense for (i > 4, and we denote by Tensor-SDP its most 
basic version where d = 4. The Tensor-SDP algorithm applies not just to the 2 — > 4 norm, 
but to optimizing general polynomials over the unit ball of L2(1/) by replacing HA/H^ with 
an arbitrary polynomial P. 

While we do not know the worst-case performance of the Tensor-SDP algorithm, we do 
know that it performs well on random instances (see Section 7), and (perhaps more relevant 
to the UGC) on the projector to low-degree polynomials (see Theorem 2.2). The latter is a 
corollary of the following result: 

Lemma 5.1. Over the space of n-variate Fourier polynomials'^ f with degree at most d, 

e/ ^ ¥[lEf^f , 
where the expectations are over {±1}". 

Proof. The result is proven by a careful variant of the standard inductive proof of the hy- 
percontractivity for low-degree polynomials (see e.g. [O'DOV]). We include it in this part 
of the paper since it is the simplest example of how to "lift" known proofs about functions 
over the reals into proofs about the fictitious random variables that arise in semidefinite 
programming hierarchies. To strengthen the inductive hypothesis, we will prove the more 
general statement that for / and g being «-variate Fourier polynomials with degrees at most 
d and e, it holds that ^f^g^ < (e/^^^E^^). (Formally, this polynomial relation is 
over the linear space of pairs of n-variate Fourier polynomials (/, g), where / has degree at 
most d and g has degree at most e.) The proof is by induction on the number of variables. 

If one of the functions is constant (so that d = or e = 0), then Ef^g^ = 
(E/^)(E g^), as desired. Otherwise, let /o, f\,go, gi be Fourier polynomials depending only 

'An M-variate Fourier polynomial with degree at most d is a function /; (±1)" R of the form / = 

'Z,aQln].\a\id faXn where^„(x) = Ilieff^i- 
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on xi, . . . ,x„_i such that f{x) = /o(x) + x„/i(x) and g{x) = goix) + x„gi{x). The Fourier 
polynomials /o,/i,g'o?^i depend linearly on / and ^ (because /o(x) = ^/(xi, . . . , x„„i, 1) + 
i/(xi, . . . , x„_i, -1) and /i(x) = ^/(xi, . . . , x„_i, 1) - ^/(xi, . . . , x„-i, -1)). Furthermore, 
the degrees of /o, fi, go, and gi are at most d, d - I, e, and e - 1, respectively. 

Since E x„ = E xj| = 0, if we expand E f^g^ = E(/o + x„fi )'^{go + x„gi)^ then the terms 
where x„ appears in an odd power vanish, and we obtain 

EfV = E/o'^^ + f^g] + Mg\ + fhl + Wigogi 

By expanding the square expression 2E(/o/i - gogi)^, we get 4E/o/i^o0'i ^ 2E/q^j + 
/j^^Q and thus 

E fg^ < E f^gl + E /f^? + 3 ^ /2^2 ^ 3 ^ j-i^l _ 

Applying the induction hypothesis to all four terms on the right-hand side of (5.1) (using 
for the last two terms that the degree of /i and gi is at most d - I and e - 1), 

E/V < (E/2)(E^2j ^9^? (E/f)(E02j 

+ 3 . 9^-^/^ (E/o^) (E,?) + 3 . 9^-^/^ (Bf^) (e,^) 
= 9'?(E/2+E/2)(E92^E92j _ 

Since E f^+'E = E(/o + x„/i)^ - E/^ (using Ex„ = 0) and similarly E^^+E^^ = E^^, 
we derive the desired relation Ef^g^ < 9^ (e/^) (e □ 

6 SoS succeeds on Unique Games integrality gaps 

In this section we prove Theorem 2.6, showing that 8 rounds of the SoS hierarchy can beat 
the Basic SDP program on the canonical integrality gaps considered in the literature. 

Theorem 6.1 (Theorem 2.6, restated). For sufficiently small e and large k, and every n e N, 
let 'W be an n-variable k-alphabet Unique Games instance of the type considered in [RS09, 
KS09, KPSIO] obtained by composing the "quotient noisy cube" instance of [KV05] with 
the long-code alphabet reduction of [KKMO04] so that the best assignment to 'W's variable 
satisfies at most an e fraction of the constraints. Then, on input 'W, eight rounds of the SoS 
relaxation outputs at most 1/100. 

6.1 Proof sketch of Theorem 6.1 

The proof is very technical, as it is obtained by taking the already rather technical proofs of 
soundness for these instances, and "lifting" each step into the SoS hierarchy, a procedure 
that causes additional difficulties. The high level structure of all integrality gap instances 
constructed in the literature was the following: Start with a basic integrality gap instance 
of Unique Games where the Basic SDP outputs 1 - o(l) but the true optimum is o(l), the 
alphabet size of H is (necessarily) R - (jo{\). Then, apply an alphabet-reduction gadget 
(such as the long code, or in the recent work [BGH^ll] the so called "short code") to 
transform tl into an instance with some constant alphabet size k. The soundness proof 
of the gadget guarantees that the true optimum of 14 is small, while the analysis of previous 
works managed to "lift" the completeness proofs, and argue that the instance 1i survives a 
number of rounds that tends to infinity as s tends to zero, where (1 - e) is the completeness 



14 



value in the gap constructions, and exact tradeoff between number of rounds and s depends 
on the paper and hierarchy. 

The fact that the basic instance ^ has small integral value can be shown by appealing to 
hypercontractivity of low-degree polynomials, and hence can be "lifted" to the SoS world 
via Lemma 5.1. The bulk of the technical work is in lifting the soundness proof of the 
gadget. On a high level this proof involves the following components: (1) The invariance 
principle of [MOO05], saying that low influence functions cannot distinguish between the 
cube and the sphere; this allows us to argue that functions that perform well on the gadget 
must have an influential variable, and (2) the influence decoding procedure of [KKMO04] 
that maps these influential functions on each local gadget into a good global assignment for 
the original instance tl. 

The invariance principle poses a special challenge, since the proof of [MOO05] uses so 
called "bump" functions which are not at all low-degree polynomials.'" We use a weaker 
invariance principle, only showing that the 4 norm of a low influence function remains the 
same between two probability spaces that agree on the first 2 moments. Unlike the usual 
invariance principle, we do not move between Bernoulli variables and Gaussian space, but 
rather between two different distributions on the discrete cube. It turns out that for the 
purposes of these Unique Games integrality gaps, the above suffices. The lifted invariance 
principle is proven via a "hybrid" argument similar to the argument of [MOO05], where 
hypercontractivity of low-degree polynomials again plays an important role. 

The soundness analysis of [KKMO04] is obtained by replacing each local function with 
an average over its neighbors, and then choosing a random influential coordinate from the 
new local function as an assignment for the original uniquegames instance. We follow the 
same approach, though even simple tasks such as independent randomized rounding turn 
out to be much subtler in the lifted setting. However, it turns out that by making appropriate 
modification to the analysis, it can be lifted to complete the proof of Theorem 2.6. 

In the following, we give a more technical description of the proof. Let T\-,^ be 
the 77-noise graph on {±1}'^. Khot and Vishnoi [KV05] constructed a unique game 1i 
with label-extended graph Ti^j^. A solution to the level-4 SoS relaxation of li is 4- 
f.r.v. h over L2({+1}^). This variable satisfies h{xf =/, h{x) for all x € {±1}^ and also 
]E/,(E/i)^ < IjE?-. (The variable h encodes a 0/1 assignment to the vertices of the label- 
extended graph. A proper assignment assigns 1 only to a I //? fraction of these vertices.) 
Lemma 6.7 allows us to bound the objective value of the solution h in terms of the fourth 
moment E/, E(P>^/j)^, where /'>^ is the projector into the span of the eigenfunctions of 
Ti-j^ with eigenvalue larger than A « (Note that E(P>^/i)'' is a degree-4 polyno- 

mial in h.) For the graph T\-,^, we can bound the degree of P>i/j as a Fourier polyno- 
mial (by about log(/?)). Hence, the hypercontractivity bound (Lemma 5.1) allows us to 
bound the fourth moment E/, E(P>^/j)^ < E/,(E/j^)^. By our assumptions on h, we have 
E/,(E h^)^ = E/,(E h)^ < 1 /R^. Plugging these bounds into the bound of Lemma 6.7 demon- 
strates that the objective value of h is bounded by 1 /R^'-'>'> (see Theorem 6. 1 1). 

Next, we consider a unique game "W obtained by composing the unique game tl with 

the alphabet reduction of [KKMO04]. Suppose that has alphabet Q. = {0, . . . ,k-l]. The 

vertex set of "W isVxQ.^ (with V being the vertex set of U). Let / = {fu}uev be a solution 

to the level-8 SoS relaxation of "W. To bound the objective value of /, we derive from it 

a level-4 random variable h over L2iV x [R]). (Encoding a function on the label-extended 

(<f) — — 
graph of the unique game 'W.) We define h{u,r) = InfJ." /„, where { - log/: and /„ is 

'"A similar, tiiough not identical, challenge arises in [BGH*1 1] where they need to extend the invariance 
principle to the "short code" setting. However, their solution does not seem to apply in our case, and we use a 
different approach. 
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a variable of L2(f2^) obtained by averaging certain values of /„ ("folding"). It is easy to 
show that < h (using Lemma A. 1) and E/j(E/i)^ < €IR (bound on the total influence 
of low-degree Fourier polynomials). Theorem 6.9 (influence decoding) allows us to bound 
the objective value of / in terms of the correlation of h with the label-extended graph of 1i 
(in our case, T\-,^). Here, we can use again Theorem 6.1 1 to show that the correlation of h 
with the graph Ti-,, is very small. (An additional challenge arises because h does not satisfy 
=h h, but only the weaker condition < h. Corollary 6.5 fixes this issue by simulating 
independent rounding for fictitious random variables.) To prove Theorem 6.9 (influence 
decoding), we analyze the behavior of fictitious random variables on the alphabet-reduction 
gadget of [KKMO04]. This alphabet-reduction gadget essentially corresponds to the e- 
noise graph Ti-g on Q.^. Suppose ^ is a fictitious random variables over L2(Q^) satisfying 
g'^ < g- By Lemma 6.7, we can bound the correlation of g with the graph Ti-e in terms 
of the fourth moment of P>Ag- At this point, the hypercontractivity bound (Lemma 5.1) 
is too weak to be helpful. Instead we show an "invariance principle" result (Theorem 6.2), 
which allows us to relate the fourth moment of P>Ag to the fourth moment of a nicer random 
variable and the influences of g. 

Organization of the proof. We now turn to the actual proof of Theorem 6. 1 . The proof 
consists of lifting to the SoS hierarchy all the steps used in the analysis of previous in- 
tegrality gaps, which themselves arise from hardness reductions. We start in Section 6.2 
by showing a sum-of-squares proof for a weaker version of [MOO05] 's invariance princi- 
ple. Then in Section 6.3 we show how one can perform independent rounding in the SoS 
world (this is a trivial step in proofs involving true random variables, but becomes much 
more subtle when dealing with SoS solutions). In Sections 6.4 and 6.5 we lift variants of 
the [KKMO04] dictatorship test. The proof uses a SoS variant of influence decoding, which 
is covered in Section 6.6. Together all these sections establish SoS analogs of the soundness 
properties of the hardness reduction used in the previous results. Then, in Section 6.7 we 
show that analysis of the basic instance has a sum of squares proof (since it is based on 
hypercontractivity of low-degree polynomials). Finally in Section 6.8 we combine all these 
tools to conclude the proof. In Section 6.9 we discuss why this proof apphes (with some 
modifications) also to the "short-code" based instances of [BGH^ll]. 

6.2 Invariance Principle for Fourth Moment 

In this section, we will give a sum-of-squares proof for a variant of the invariance principle 
of [MOO05]. Instead of for general smooth functionals (usually constructed from "bump 
functions"), we show invariance only for the fourth moment. It turns out that invariance of 
the fourth moment is enough for our applications. 

Let ^ = 2' for ? € N and let A' = (Xi,. . . ,?(r) be an independent sequence'^ of or- 
thonormal ensembles Xr = iX,-fi, . . . ,Xrjt_i). Concretely, we choose Xrj = Xii^r), where 
Xq, - ■ ■ ,Xk-\ is the set of characters of F2 and x is sampled uniformly from (Fj)^. Every 
random variable over (F^^^ can be expressed as a multilinear polynomial over the sequence 
X. In this sense, X is maximally dependent. On the other hand, let J/ - (J/i, . . . , J/«) be a 
sequence of ensembles J/^ = {Yrfi, . . ., Yi-^^-i), where F^.o = 1 and Y^j are independent, un- 
biased {±1} Bernoulli variables. The sequence J/ is maximally independent since it consists 

''An orthonormal ensemble is a collection of orthonormal real-valued random variables, one being the con- 
stant 1 . A sequence of such ensembles is independent if each ensemble is defined over an independent proba- 
bility space. (See [MOO05] for details.) 
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of completely independent random variables. 

Let / be a 4-f.r.v. over the space of multilinear polynomials with degree at most { and 
monomials indexed by {k\^. Suppose Ej||/||^ < 1. (In the following, we mildly overload 
notation and use \k\ to denote the set {0, . . . , ^ - 1}.) Concretely, we can specify / by the 
set of monomial coefficients {fa]ae[k\'^, where |a| is the number of non-zero entries in 
a. As usual, we define Inf^/ = Yjae[k\^, Or^of^- Note that Inf,-/ is a degree-2 polynomial 
in /. (Hence, the pseudo-expectation of (Inf ^ f)^ is defined.) 

Theorem 6.2 (Invariance Principle for Fourth Moment). For r = Ej Y^ri^^^r f)^, 

f f 

(Since the expressions E;^/"^ and Ei/ are degree-4 polynomials in /, their pseudo- 
expectations are defined.) 

Using the SoS proof for hypercontractivity of low-degree polynomials (over the en- 
semble J/), the fourth moment Ey Ey/"* is bounded in terms of the second moment 
EjEj//^. Since the first two moments of the ensembles X and J/ match, we have 
Ey Ej//^ = E/E;^/^. Hence, we can bound the fourth moment of / over X in terms 
of the its second moment and t. 

Corollary 6.3. 

EE/-2^(^)E(E/2)2±)t^WV7. 

(The corollary shows that for small enough t, the 4-norm and 2-norm of / are within a 
factor of 2'^^^\ This bound is useful because the worst-case ratio of these norms is k'^^^^ » 
20(^).) 

Proof of Theorem 6.2. We consider the following intermediate sequences of ensembles 
Z^'^ = (Xu-. . , A',, . . . , J/r). Note that = J/ and Z^'^^ - X. For r € N, we 
write / = Erf + Drf, where Erf is the part of / that does not dependent on coordinate r 
and Drf - f - Erf. For all r e N, the following identities (between polynomials in /) hold 

E / - E / - E {Erf + Drff - E {Erf + Drff 

- E A{Erf){Drff + {Drff - E A{Erf){Drff + {Drff . 

Z(r) 

The last step uses that the first two moments of the ensembles Xr and J/^ match and that 
Erf does not dependent on coordinate r. 
Hence, 

E/4 - E/ = V E A{Erf){Drff + {Drff - E A{Erf){Drf? + {Drff 
r 

It remains to bound the pseudo-expectation of the right-hand side. First, we consider 
the term Y^r^z'-'^^^rff ■ The expression 'Ez(r){Drff is the fourth moment of a Fourier- 
polynomial with degree at most t ■ £. (Here, we use that the ensembles in the sequence J/ 
consist of characters of Fj, which are Fourier polynomials of degree at most t.) Furthermore, 
Inf,- / - Ez(r){Drff is the second moment of the this Fourier-polynomial. Hence, by hy- 
percontractivity of low-degree Fourier-polynomials, Yjr^Z'^'^^^rff ^ 2^^''^\lnf,./)^. 
Thus, the pseudo-expectation is at most ^fY^r^z'^^^^rff ^ 2^*''^V = k'^'-^h. 
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Next, we consider the term Yjr^z^'^^^rDiDrf)^- (The remaining two terms are analo- 
gous.) To bound its pseudo-expectation, we apply Cauchy-Schwarz, 



E J] E^(£./)(Z),/)3 < E J] E^(£,/)2(D,/)2 



xl/2 



eV e(d,-/)^ 



,1/2 



(6.1) 



Using hypercontractivity of low-degree Fourier-polynomials, we can bound the second fac- 
tor of (6.1) by E/X^E^(,)(D,-/)'* = k'^'-^h. It remains to bound the first factor of (6.1). 
Again by hypercontractivity, B^,,{Erf)\D,.ff < FW-||£,/||2.||D,/|p < /:^(^)||/|p-||D,/||2. 
By the total influence bound for low-degree polynomials, we have 2irll^r/lP ^ ^ll/lP- Thus 
Y.r^Z^r){Erff{Drff < k^'^%f\t . Using the assumption E/||/||'^ < 1, we can bound the 
first factor of (6.1) by k^^^\ 
We conclude as desired that 



EEf'^-Ef'^ 
f X J/ 



□ 



6.3 Interlude: Independent Rounding 

In this section, we will show how to convert variables that satisfy f^<fto variables / satis- 
fying P' = f. The derived variables / will inherit several properties of the original variables 
/ (in particular, multilinear expectations). This construction corresponds to the standard in- 
dependent rounding for variables with values between and 1 . The main challenge is that 
our random variables are fictitious. 

Let / be a 4-f.r.v. over R". Suppose ff < (in terms of an unspecified jointly- 
distributed 4-f.r.v.). Note that for real numbers x, the condition x is equivalent to 

X € [0, 1]. 

Lemma 6.4. Let f be a 4-f.r.v. over R" and let i € [n] such that ff < fj. Then, there exists 
an 4-f.r.v. {f,fi) over R"+i such that ^fjiff - ff = and for every polynomial P which 
is linear in f and has degree at most 4, 

^Pif,f) = BPif,f). 

fji f 

Proof. We define the pseudo-expectation functional E^^^ as follows: For every polynomial 
P in (/, fi) of degree at most 4, let P' be the polynomial obtained by replacing ffhyf until 
P' is (at most) linear in f. (In other words, we reduce P modulo the relation f?- = f.) We 
define ^fj.P{f,f) = ^fP'{f,f). With this definition, ^fj_{jf - ff = 0. The operator 

f. is clearly linear (since (P -i- Q)' - P' + Q'). It remains to verify positivity. Let P 
be a polynomial of degree at most 4. We will show j^. P^{f, f) > 0. Without loss of 
generality P is linear in f. We express P = Q fR, where Q and R are polynomials in 
/. Then, (P^)' = + IfQR + fR^. Using our assumption ff < f, we get {P^)'{f,fi) = 

+ IfQR + fR^ >Q^ + IfQR + ffR^ = P\f, f). It follows as desired that 

E p2 = E(p2)'(/,y;.) > BP\f,f) > . 
fji f f 

□ 
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Corollary 6.5. Let f be a 4-f.r.v. over R" and let I c [«] such that ff < ft for all i € /. 

Then, there exists an 4-f.r.v. {f,fi) over R""*"'^' such that j^^if^ - f)^ - Ofor all i € / and 
for every polynomial P which is multilinear in the variables {fi}iei and has degree at most 
4, 

EP(/,//) = EP(/,//). 

/■// / 

6.4 Dictatorship Test for Small- Set Expansion 

Let D. = {0, . . . ,k - 1} and let be the noise graph on Q.^ with second largest eigenvalue 
1 - e. Let / be a 4-f.r.v. over L2(n^). Suppose f^<f (in terms of an unspecified jointly- 
distributed 4-f.r.v.). Note that for real numbers x, the condition x^ <^ x equivalent to 

X € [0, 1]. 

The following theorem is an analog of the "Majority is Stablest" result [MOO05]. 
Theorem 6.6. Suppose Ef{B ff < S^. Let t = ^fY.A^^^f'^^ ff for € = Q(log(l/(5)). 

(Here, we assume that s, S and t are sufficiently small.) 

The previous theorem is about graph expansion (measured by the quadratic form 
(/, T\^sf)). The following lemma allows us to relate graph expansion to the 4-norm of 
the projection of / into the span of the eigenfunctions of Ti^s with significant eigenvalue. 
We will be able to bound this 4-norm in terms of the influences of / (using the invariance 
principle in the previous section). 

Lemma 6.7. Let f be a 4-f.r.v. over L2iO.^). Suppose f^<f (in terms of unspecified 
jointly-distributed 4-f.r.v. s). Then for all /I > 0, 

E</,ri-,/> < {E.¥.ff^\t.nP>Afff'^ + ^^'Ef. 

Here, P>a is the projector into the span of the eigenfunctions ofTi^ with eigenvalue larger 
than A. 

Proof. The following relation between polynomials holds 

{f,Ti^J)<Bf-{P^,f) + ABf\ 
By Corollary 6.5, there exists a 4-f.r.v. (/, /) over L2(Q'^)xL2(Q^) such that p =/ /. Then, 
EE/- (P>^/) = E E / • (P>^/) (using hnearity in /) 
= E_ E f • (P>,/) (using f /) 

< {Ef Ef^f'"^ ■ (E/ E(P>^/)'*)^^'^ (using Lemma A.5 (Holder)) 

= (% E ff'^ • (% E(P>,/)4)'^^ (using f /) 

= {EfEff'^-{Ef E(P>,/)4) ' (using hnearity in /) □ 
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Proof of Theorem 6.6. By Lemma 6.7, 

E(/, T,_J) < (EE/)3/4(EE(P>^/f)i/4 + iEE / . 

Using CoroUaiy 6.3, 

E</, Ti-J) < 2'^^^^ • (EE/)^/'^(E(E/)2 + Vt • yk^^^^)^^"^ + ^EE / . 

Here, ^ = log(l//})/e. Using the relation f^<f and our assumption Ej(E/)^ < 5^, we get 
E/E/2 < E/E/ < (E/(E/)2)i/2 < 5 (by Cauchy-Schwarz). Hence, 

To balance the terms {Ij A)^'^^^''^6^l^ and A6, we choose i = 6^^'^\ We conclude the desired 
bound, 

6.5 Dictatorship Test for Unique Games 

Let Q. = (cychc group of order k) and let / be a 4-f.r.v. over L2(Q x Q.'^). Here, /(a, x) 
is intended to be 0/1 variable indicating whether symbol a is assigned to the point x. 

The following graph T'^_^ on O x corresponds to the 2-query dictatorship test for 
Unique Games [KKMO04], 

T', f{a, x) = E E f(a + c,y - c - I) . 

Here, i/ ~i_e x means that !/ is a random neighbor of x in the graph Ti-e (the £-noise graph 
on O^). 

We define /(x) := EceSi f(c, x - c • I). (We think of / as a variable over L2(n'^).) Then, 
the following polynomial identity (in /) holds 

{f,TlJ) = {f,Ti.J). 

Theorem 6.8. Suppose f^<f and E/(E/)2 < 6^. Let r - Bfl,,{lnf^;^^^ ff for £ = 
Q(log(l/(5)). Then, 

E</, T[_J) < S'^""^'^ + yfcOdogdM)) . ^1/8 , 
(Here, we assume that s, 3 and r are sufficiently small.) 

Proof. Apply Theorem 6.6 to bound Ej(/, Ti^f-f). Use that fact that E / = E / (as polyno- 
mials in /). □ 

6.6 Influence Decoding 

Let tlbe a unique game with vertex set V and alphabet [/?]. Recall that we represent 1/ as 
a distribution over triples (u, v, n) where u,v e V and tt is a permutation of [R]. The triples 
encode the constraints of tl. We assume that the unique game tl is regular in the same that 
every vertex participates in the same fraction of constraints. 
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Let Q. - (cyclic group of order k). We reduce '1/ to a unique game ^ = ^^ ki^) 
with vertex set V x D.^ and alphabet Q.. Let / = |/i,)„gv be a variable over 
The unique game corresponds to the following quadratic form in /, 

(/,nV/>- E E {f^''\T[_jf^) . 

ueV {u,v,n)~'U\ii " 
(u,v' ,n')~'U\u 

Here, (u, v,n) ~ 14 \ u denotes a random constraint of li incident to vertex u, the graph T'^_^ 
corresponds to the dictatorship test of Unique Games defined in Section 6.5, and f'f\a, x) = 
fv(a, n.x) is the function obtained by permuting the last R coordinates according to n (where 
n.xii) = X;:(i)). 

We define g„ = 'E(u,v,!T)~ti\u ft^. Then, 

{f,^f) = ^^{gu,T[_,gu}. (6.2) 

Bounding the value of SoS solutions. Let / = [fu}uev be a solution to the level-t/ SoS re- 
laxation for the unique game "W. In particular, / is a <i-f.r.v. over L2(nxn'^)^. Furthermore, 
E/(E/„)2 < 1/^2 foj. vertices ueV. 

By applying Theorem 6.8 to (6.2), we can bound the objective value of / 

/ \^^^ 
£(/, nV/) < l/A:^+"(^) + E E T„ , 

/ \f ueV I 

where t„ = HXInf r^^^ duf, duix) ^ Eiu,v^)~ti\u and f,{x) = E^en /„(c, x-c-I). 
Since Inf ^.^'^^ is a positive semidefinite form. 

Let h be the level-J/2 fictitious random variable over LiiV X [R]) with h{u,r) = Inf;."'V«- 
Let Gu be the label-extended graph of the unique game 14. Then, the previous bound 
on Tu shows that ^uev^u ^ R ■ \\Gq^h\\^ . Lemma A. 1 shows that < h. On the other 
hand, YjrKu,r) < {\\fu\\^ < (bound on the total influence of low-degree Fourier 

polynomials). In particular, E/z < ^E„gy||/„|p//?. Since / is a valid SoS solution for the 
unique game "W, we have Ej||/„||'^ < l/k'^^^ for all u e V. (Here, we assume that d is even.) 
It follows that E/,(E/2)^''2 ^ 

The arguments in this subsection imply the following theorem. 

Theorem 6.9. The optimal value of the level-d SoS relaxation for the unique game 'W - 
"IVe (tC^) is bounded from above by 

I \l/8 
l/yt"^'^) + yt^(l°g*^) [r . maxE||Gt//j|P 

\ h h / 

where the maximum is over all level-d/2 fictitious random variables h over LjiV x [/?]) 
satisfying h^ <h andBhCEhf'^ < f/R"^'^. 

Remark 6.10. Since the quadratic form HG-i/ZjlP has only nonnegative coefficients (in the 
standard basis), we can use Corollary 6.5 to ensure that the level-<i/2 random variable h 
satisfies in addition h^ =/, h. 
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6.7 Certifying Small-Set Expansion 

Let ri_£ be a the noise graph on {± 1)^ with second largest eigenvalue 1 - e. 

Theorem 6.11. Let f be level-A fictitious random variables over L2({+1)^). Suppose that 
f^<f (in terms of unspecified jointly-distributed level-A fictitious random variables) and 
thatBfiBff < S^. Then, 

Proof. By Lemma 6.7 (applying it for the case Q = [0, 1}), for every A> 0, 
J{f,Ti-sf) < {BEff\EE{P^^f)y'UAJEf. 

For the graph Ti-e, the eigenfunctions with eigenvalue larger than A are characters with 
degree at most log(l/i)/£. Hence, Lemma 5.1 imphes E(/'>^/)4 < {l/Af^^''''>\\f\f. Since 
/2 < f, we have \\ft < (E/)^. Hence, E/E(/'>^/)4 < (l/i)0(i/'=)52_ plugging in, we get 

£(/, Ti.J) < {llAf^^l'^h^l^ + A-6. 
To balance the terms, we choose A = 5^'''^\ which gives the desired bound. □ 

6.8 Putting Things Together 

Let T\-.jj be a the noise graph on { + 1 )^ with second largest eigenvalue l-ij. Let tl = tl^^R be 
an instance of Unique Games with label-extended graph Gn - Ti^jj (e.g., the construction 
in [KV05]). 

Combining Theorem 6.9 (with d = 4) and Theorem 6. 11 gives the following result. 

Theorem 6.12. The optimal value of the level-S SoS relaxation for the unique game 'W = 
^E,ki'^ri,R) is bounded from above by 

l/yt"(-)+ytO(log'=)./?-nW. 
In particular, the optimal value of the relaxation is close to \/k^^^^ if log R » (log k)^/T]. 

6.9 Refuting Instances based on Short Code 

Let 'W = tlljK be an instance of Unique Games according to the basic construction in 
[BGH+11]. (The label-extended graph of tl will be a subgraph of Tis induced by the 
subset of {+1}'^ corresponding to a Reed-Muller code, that is, evaluations of low-degree 
F2-polynomials.) 

Let = k^^'r] fi) '■^^ unique game obtained by applying the short-code alphabet 
reduction of [BGH+11].' 

The following analog of Theorem 6. 12 holds. 

Theorem 6.13. The optimal value of the level-S SoS relaxation for the unique game 'W' = 
"W'^ l^{t('^ ^) is bounded from above by 

In particular, the optimal value of the relaxation is close to l/k^'-^^ if log R » (log k)-^/r]. 
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The proof of Theorem 6.13 is almost literally the same as the proof of Theorem 6.12. 
In the following, we sketch the main arguments why the proof doesn't have to change. 
First, several of the results of the previous sections apply to general graphs and instances 
of Unique Games. In particular. Lemma 6.7 applies to general graphs and Theorem 6.9 
applies to general gadget-composed instances of unique games assuming a "Majority is 
Stablest" result for the gadget. In fact, the only parts that require further justification are 
the invariance principle (Theorem 6.2) and hypercontractivity bound (Lemma 5.1). Both 
the invariance principle and the hypercontractivity bound are about the fourth moment of 
a low-degree Fourier polynomial (whose coefficients are fictitious random variables). For 
the construction of [BGH+ 11], we need to argue about the fourth moment with respect to a 
diff'erent distribution over inputs. (Instead of the uniform distribution, [BGH''"1 1] considers 
a distribution over inputs related to the Reed-Muller code.) However, this distribution 
happens to be ^-wise independent for kjA larger than the degree of our Fourier polynomial. 
Hence, as a degree-4 polynomial in Fourier coefficients, the fourth moment with respect 
to the [BGH"'"ll]-input distribution is the same as with respect to the uniform distribution, 
which considered here. 

7 Hypercontractivity of random operators 

We already saw that the Tensor-SDP algorithm provides non-trivial guarantees on the 2 — > 
4 norms of the projector to low-degree polynomials. In this section we show that it also 
works for a natural but very diff"erent class of instances, namely random linear operators. 

Let A - YIJLi ^i'^J / V"' where e, is the vector with a 1 in the position, and each a,- is 
chosen i.i.d. from a distribution D on R". Three natural possibilities are 

1. £)sign: the uniform distribution over {-1,1}" 

2. i^Gaussian: a vcctor of n independent Gaussians with mean zero and variance 1 

3. Dunit^ a uniformly random unit vector on R". 

Our arguments will apply to any of these cases, or even to more general nearly-unit vectors 
with bounded sub-Gaussian moment (details below). 

Before discussing the performance of Tensor-SDP, we will discuss how the 2 — > 4- 
norm of A behaves as a function of n and m. We can gain intuition by considering two 
limits in the case of Doaussian- Itn = I, then ||A||2^4 = llalU, for a random Gaussian vector 
a. For large m, \\a\\4 is likely to be close to 3'^^*, which is the fourth moment of a mean-zero 
unit- variance Gaussian. By Dvoretzky's theorem [Pis99], this behavior can be shown to 
extend to higher values of n. Indeed, there is a universal c > such that if « < c ^^Ims^, 
then w.h.p. ||A||2-»4 < 3^^^ -i- s. In this case, the maximum value of ||Ax||4 looks roughly the 
same as the average or the minimum value, and we also have ||A;c||4 ^ (3^^^ - e)||;c||2 for all 
;c € R". In the cases of Dsign and Dunit. the situation is somewhat more complicated, but 
for large n, their behavior becomes similar to the Gaussian case. 

On the other hand a simple argument (a variant of Corollary 10.2) shows that ||A||2^4 > 
n^l^ Ini^l'^ for any (not only random) mxn matrix with all +1/ ^Jn entries. A nearly identical 
bound applies for the case when the a, are arbitrary unit or near-unit vectors. Thus, in the 
regime where ^Im), we always have ||A||2^4 > t'^(l). 

The following theorem shows that Tensor-SDP achieves approximately the correct an- 
swer in both regimes. 
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Theorem 7.1. Let ai,...,am be drawn i.i.d. from a distribution T) on R" with T) e 
{'DGamsicm,'D,i^n,'Dunit\ and let A - YIt=\eia]l^. Then w.h.p. Tensor-SDP(A) < 

2 

3 + c max(-7=, — ) for some constant c > 0. 

From Theorem 7.1 and the fact that ||A||2^4 < Tensor-SDP(A), we obtain: 
Corollary 7.2. Let A be as in Theorem 7.1. Then 3c > such that w.h.p. 

('31/^ + c-^ ifn < ^/E 
||A||2^4< „,/2 ^ ^ (V.l) 

Before proving Theorem 7.1, we introduce some more notation. This will in fact imply 
that Theorem 7.1 applies to a broader class of distributions. For a distribution D on R^, 
define the i//p norm \\!D\\^^ to be the smallest C > such that 

|(ii.a)|PA'P/- 

max E e cp < 2, (7.2) 

or 00 if no finite such C exists. We depart from the normal convention by including a factor 
of NP^^ in the definition, to match the scale of [ALPTJll]. The (A2 norm (technically a 
seminorm) is also called the sub-Gaussian norm of the distribution. One can verify that for 
each of the above examples (sign, unit and Gaussian vectors), tA2(2)) ^ 0{l). 

We also require that D satisfies a boundedness condition with constant K ^ I, defined 

as 

P 



maxllailli > Kmax{l,{m/N)^''^) 

ie[m] 



<e-^. (7.3) 



Similarly, K can be taken to be 0(1) in each case that we consider. 

We will require a following result of [ALPTJIO, ALPTJll] about the convergence of 
sums of i.i.d rank-one matrices. 

Lemma 7.3 ([ALPTJll]). Let D' be a distribution on HJ^ such that E^-v t'v^ - I> 
W^'Wif/i ^ lA and (7.3) holds for D' with constant K. Let V[,. . . ,Vm be drawn i.i.d. from 
D'. Then with probability ^1-2 exp(-c Va^), we have 

(1 -£)/<- y vivj < (1 + e)L (7.4) 

t=\ 

where e — C(i/' -1- m?ix{Nlm, ^JN jm) with c, C > universal constants. 

The N ^ m case (when the yfWJm term is applicable) was proven in Theorem 1 of 
[ALPTJll], and the N > m case (i.e. when the max is achieved by NIm) was proven in 
Theorem 2 of [ALPTJl 1] (see also Theorem 3.13 of [ALPTJIO]). 

Proof of Theorem 7.1. Define A2,2 - - YI'i=\ aiCtJ ® ajaj . For n^ x n^ real matrices X, Y, 
define {X, Y) := TrX^Y/n^ = E,- X,- y7jj. Additionally define the convex set ?( to be 
the set of n^ x n^ real matrices X ^ {X^^j2Uii,u))hj2J3J4e[n] with X>0, ^ijew ^(umj) = 1 
and X(,-,,,-2),(,-3,,-^) ^ X(i^a)M2)),(M3),im) permutation n e ^4. Finally, let hx{Y) — 

maxxe;^(X, Y). It is straightforward to show (c.f. Lemma 9.3) that 

Tensor-SDP(A) = hx{A2 2) - max(X,A2 2>- (V.5) 



24 



We note that if X were defined without the symmetry constraint, it would simply be the 
convex hull of xx for unit vectors x € R" and Tensor-SDP(A) would simply be the 
largest eigenvalue of A2,2- However, we will later see that the symmetry constraint is crucial 
to Tensor-SDP(A) being 0(1). 

Our strategy will be to analyze A 2,2 by applying Lemma 7.3 to the vectors d,- := 
X"^^^(<3i ® Qi), where E = EaiaJ ® ataj, and denotes the pseudo-inverse. First, ob- 
serve that, just as the 1/^2 norm of the distribution over a,- is constant, a similar calculation 
can verify that the ifri norm of the distribution over a, a,- is also constant. Next, we have 
to argue that E"^^^ does not increase the norm by too much. 

To do so, we compute E for each distribution over a, that we have considered. Let F be 
the operator satisfying F{x (gi i/) = i/ ® x for any x,y e R"; explicitly F = f „((1, 2)) from 
(9.9). Define 

n 

O := ei ® ei (7.6) 

(=1 

n 

A:=Y,eieJ ®eief il.l) 

Direct calculations (omitted) can verify that the cases of random Gaussian vectors, random 
unit vectors and random ± 1 vectors yield respectively 

^Gaussian = I + F + (^(l)^ (7.8a) 

n 

^unit ~ 7 ^Gaussian (7.8b) 
n + \ 

^sign — ^Gaussian ~ 2 A (7.8c) 

In each case, the smallest nonzero eigenvalue of E is f^(l), so Vi - l,~^^^{ai (g) a,) has 
(Ai < 0(1) and satisfies the boundedness condition (7.3) with K ^ 0{l). 

Thus, we can apply Lemma 7.3 (with N = rankE < and £ := cmax(n/ ^/m,n^/m)) 
and find that in each case w.h.p. 

A22 = - Y ataj (g) ataj < (l + e))I, < (l + E)il + F + OO^) (7.9) 

1=1 

Since hxiY) > whenever F > 0, we have hx{A22) < (1 -i- E)hx(L). Additionally, 

so we can bound each of three terms separately. 
Observe that / and F each have largest eigenvalue equal to 1, and so hxil) < 1 and hx{F) < 
1. (In fact, these are both equalities.) 

However, the single nonzero eigenvalue of is equal to n. Here we will need to use 
the symmetry constraint on X. Let be the matrix with entries .^^ ^.^ .^^ :- ^(i,,i4),(i3,i2)- 
If X e ^ then X - X^. Additionally, <X, y) - (X^, F^) Thus 

hxi^^'') = hxii^^^f) < m^^fh^i = 1. 

This last equality follows from the fact that (OO^)'^ = F. 

Putting together these ingredients, we obtain the proof of the theorem. □ 

It may seem surprising that the factor of 3^^^ emerges even for matrices with, say, ±1 
entries. An intuitive justification for this is that even if the columns of A are not Gaussian 
vectors, most linear combinations of them resemble Gaussians. The following Lemma 
shows that this behavior begins as soon as n is w(l). 
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Lemma 7.4. Let A = Y!^^ eiaj/y/^with E,- \\ai\\j > 1. Then ||A||2^4 > (3/(1 + 2/n)y'\ 

To see that the denominator cannot be improved in general, observe that when n = I a. 
random sign matrix will have 2^4 norm equal to 1 . 

Proof. Choose x € R" to be a random Gaussian vector such that Ej^ WxW^ = 1. Then 

E IIAjcIll -EE n-^{ajxf - E E<a;, x)"^ = 3B > 3. (7. 10) 

X i X i X i 

The last equality comes from the fact that (a,-, x) is a Gaussian random variable with mean 
zero and variance \\ai\^ln. On the other hand, Ej^ ||x||2 = 1 + Ijn. Thus, there must exist an 
X for which ||A;c||4/||x||^ > 3/(1 + Iju). □ 

Remark 7.5. It is instructive to consider a variant of the above argument. A simpler upper 
bound on the value of Tensor-SDP(A) is given simply by ||A2,2ll- However, the presence 
of the term means that this bound will be off by an n-dependent factor. Thus we 
observe that the symmetry constraints of Tensor-SDP^'*' provide a crucial advantage over 
the simpler bound using eigenvalues. In the language of quantum information (see Sec- 
tion 9.3), this means that the PPT constraint is necessary for the approximation to succeed. 
See Section 9.3.2 for an example of this that applies to higher levels of the hierarchy as 
well. 

On the other hand, when the c?, are chosen to be random complex Gaussian vectors, we 
simply have E g) a,a* -I + F. In this case, the upper bound Tensor-SDP(A) < ||A2,2ll is 
already sufficient. Thus, only real random vectors demonstrate a separation between these 
two bounds. 

8 The 2-to-q norm and small-set expansion 

In this section we show that a graph is a small-set expander if and only if the projector 
to the subspace of its adjacency matrix's top eigenvalues has a bounded 2 — > g norm for 
even q ^ 4. While the "if" part was known before, the "only if" part is novel. This 
characterization of small-set expanders is of general interest, and also leads to a reduction 
from the Small-Set Expansion problem considered in [RSIO] to the problem of obtaining a 
good approximation for the 2 ^ q norms. 

Notation. For a regular graph G = (V, £") and a subset S Q V, we define the measure 
of S to be ii{S) = |S|/|y| and we define G{S) to be the distribution obtained by picking a 
random x e S and then outputting a random neighbor y of x. We define the expansion of 
5, to be (^ciS) = PyeG(S)[y ^ '^]> where i/ is a random neighbor of x. For 6 € (0, 1), we 
define <1>g(<5) - ^i^scv-.fiiSKS ^g(S )■ We often drop the subscript G from (Dq when it is 
clear from context. We identify G with its normalized adjacency (i.e., random walk) matrix. 
For every A e [-1, 1], we denote by ^^^(G) the subspace spanned by the eigenvectors of 
G with eigenvalue at least A. The projector into this subspace is denoted P^a{G). For a 
distribution D, we let cp(D) denote the collision probability of D (the probability that two 
independent samples from D are identical). 

Our main theorem of this section is the following: 

Theorem (Restatement of Theorem 2.4). For every regular graph G, A> and even q, 

1. (Norm bound imphes expansion) For all 6 > 0,e > 0, \\P^A{G)\\2^q < gi§iq-2)/2q 
implies that 0^(5) ^ I — A — e^. 
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2. (Expansion implies norm bound) There is a constant c such that for all S > 0, <1>g(5) > 
1 - A2-"i implies \\P^A{G)\\2^g < 2/ V^. 

One corollary of Theorem 2.4 is that a good approximation to the 2 ^ 4 norm imphes 
an approximation of (S>s{G) 

Corollary 8.1. If there is a polynomial-time computable relaxation K yielding good approx- 
imation for the 2 — > ^, then the Small-Set Expansion Hypothesis of [RSI 0] is false. 

Proof. Using [RSTlOa], to refute the small-set expansion hypothesis it is enough to come 
up with an efficient algorithm that given an input graph G and sufficiently small 6 > 0, can 
distinguish between the Yes case: <1>g(<5) < 0.1 and the No case Og(5') > 1 - 2''^^°^''^/^"' 
for any 6' ^ 6 and some constant c. In particular for all 77 > and constant d, if 6 is small 
enough then in the No case Oci^^'^) > 1 - ??■ Using Theorem 2.4, in the Yes case we know 
ll^^i/2(G)||2^4 > 1/(10(51/4)^ while in the A^o case, if we choose ?/ to be smaller then 7/(1/2) 
in the Theorem, then we know that ||Vi/2(G)||2^4 < 2/ V^^. Clearly, if we have a good 
approximation for the 2^4 norm then, for sufficiently small 6 we can distinguish between 
these two cases. □ 

The first part of Theorem 2.4 follows from previous work (e.g., see [KV05]). For com- 
pleteness, we include a proof in Appendix B. The second part will follow from the following 
lemma: 

Lemma 8.2. Set e - e{A,q) := 2'^'^ /A, with a constant c < 100. Then for every A > and 
1^6^0,ifGisa graph that satisfies cp(G(5)) < l/{e\S\)for all S with ^(S) < 6, then 
\\f\U<2\\f\\2/y/6forallfeVUG). 



Proving the second part of Theorem 2.4 from Lemma 8.2. We use the variant of the 
local Cheeger bound obtained in [StelO, Theorem 2. 1], stating that if > 1 - then for 

every / € L2{V) satisfying ||/||2 < 6\\f\\l, \\Gf\\l < c ^WfWj. The proof follows by noting 
that for every set 5, if / is the characteristic function of S then ||/||i = II/II2 = n(S), and 
cp{G{S)) = \\Gf\\l/(ti{S)\S\). □ 

Proof of Lemma 8.2. Fix A > 0. We assume that the graph satisfies the condition of the 
Lemma with e = 2'^'i/A, for a constant c that we'll set later Let G = (V, E) be such a graph, 
and / be function in V^a{G) with II/II2 = 1 that maximizes ||/||^. We write / - YIJL\ (^iXi 
where . . . ,Xm denote the eigenfunctions of G with values Ai,. . .,Am that are at least A. 
Assume towards a contradiction that ||/||^^ > 2/ yfs. We'll prove that g = YIi'=\{'^il ^i)Xi 
satisfies H^H^ > 10||/|L^//l. This is a contradiction since (using At € [A, 1]) \\g\\2 < ll/lb/^. 
and we assumed / is a function in V;s^(G) with a maximal ratio of ll/ll^/ll/lb- 

Let U Q V he the set of vertices such that \f{x)\ ^ 1 / for all a: € U. Using Markov 
and the fact that ^xevlfi^)^] = 1, we know that n(U) = \U\/\V\ < 6, meaning that under our 
assumptions any subset S Q U satisfies cp(G(5')) < 1/(^151). On the other hand, because 
11/11^ > 21/6'^/^, we know that U contributes at least half of the term ||/||^ = E.,ei//(x)'?. 
That is, if we define a to be ii{U)Ej(eu fi^Y' then a ^ ||/||^/2. We'll prove the lemma by 
showing that \\g\fq > lOa/A. 

Let c be a sufficiently large constant (c = 100 will do). We define Ui to be the set 
{x e U : f{x) e [c'/ Vs,c''''^ / V^)l, and let / be the maximal / such that Ui is non-empty. 

'-Note that although we use the 2 — > 4 noiin for simplicity, a similar result holds for the 2 ^ q norm for 
every constant even q. 
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Thus, the sets Uo, . . . ,Ui form a partition of U (where some of these sets may be empty). 
We let Of be the contribution of Uj to a. That is, a,- = jii^xeUi fi^Y ^ where = n{Ui). 
Note that a - + ■ ■ ■ + aj. We'll show that there are some indices /i, . . . , such that: 

(i) Q',1 + • • • + aij > Qf/(2c^°). 

(ii) For all j € [7], there is a non-negative function : V ^ R such that ^^ev OjixY > 

(iii) For every x e V , g\{x) + ■ ■ ■ + gj{x) < \g{x)\. 

Showing these will complete the proof, since it is easy to see that for two non-negative 
functions and even q, g' ,g" , E(^'(x) -i- g"{x)y > ^g'{x)'i + E^"(x)^, and hence (ii) and 
(iii) imply that 

ll^ll^ = E g{xf > {emcy<^) J] . (8.1) 

Using (i) we conclude that for e > (lOc)"^//!, the right-hand side of (8.1) will be larger than 
10a//l. 

We find the indices . ij iteratively. We let I be initially the set {0../} of all indices. 
For j = 1, 2, ... we do the following as long as I is not empty: 

1 . Let ij be the largest index in I. 

2. Remove from I every index / such that a,- < c'°af,v/2'~'' . 

We let J denote the step when we stop. Note that our indices /i , . . . , ij are sorted in 
descending order. For every step j, the total of the afs for all indices we removed is less 
than c^'^Oi- and hence we satisfy (i). The crux of our argument will be to show (ii) and (iii). 
They will follow from the following claim: 

Claim 8.3. Let S Q V and /3 > be such that \S\ ^ 6 and \ f{x)\ > jifor all x € S. Then 
there is a set T of size at least e\S \ such that E^-gr gixf' > /3^/4. 

The claim will follow from the following lemma: 

Lemma 8.4. Let D be a distribution with cp(D) l/N and g be some function. Then there 
is a set T of size N such that ^xeT si^)^ ^ (E g{D))^ /4. 

Proof. Identify the support of D with the set [M] for some M, we let pi denote the probabil- 
ity that D outputs /, and sort the pi's such that pi > P2 - ■ ■ Pm- We let y6' denote E giD); that 
is,jS' - Y^fii Pidii)- We separate to two cases. If '^i>N Pi9(P) ^ /372, we define the distribu- 
tion D' as follows: we set P[D' - /] to be pi for / > A^, and we let all i Nbe equiprobable 
(that is be output with probabihty (Y,f^^ Pi) IN). Clearly, E 10^(0') I > Yji>N PiOii) > /?72, but 
on the other hand, since the maximum probability of any element in D' is at most 1 IN, it 
can be expressed as a convex combination of flat distributions over sets of size N, implying 
that one of these sets T satisfies Ej^gj- \g{x)\ > /3'/2, and hence Ever g{x)^ ^ y6'^/4. 

The other case is that Yli=\ PidiO > /372. In this case we use Cauchy-Schwarz and argue 

that 

( N \( N \ 
x2 



(8.2) 



But using our bound on the collision probability, the right-hand side of (8.2) is upper 
bounded by ^ YII=\ 0(0^ = "^x^m ^W^- ° 
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Proof of Claim 8.3 from Lemma 8.4. By construction / - Gg, and hence we know that for 
every x, f{x) = Ey^x 9{y)- This means that if we let D be the distribution G{S) then 

E\g{D)\ -EE \g{y)\ ^ E | E giy)\ = E |/(x)| . 

.veS y~x xeS y~x xeS 

By the expansion property of G, cp(D) < l/Cel^l) and thus by Lemma 8.4 there is a set T 
of size e\S\ satisfying Ever gix)^ ^ /3^/4. □ 

We will construct the functions g\, ... ,gj by applying iteratively Claim 8.3. We do the 
following for j = 1, . . . , /: 

1. Let Tj be the set of size e\Uij\ that is obtained by applying Claim 8.3 to the function 
/ and the set Uij. Note that ExeTj g{x)^ ^ y6^/4, where we let = c' / Vs (and hence 
for every x e Ui,fii < \fix)\ < c/?,). 

2. Let g'j be the function on input x that outputs y • |g(x)| if x e Tj and otherwise, where 
y < 1 is a scaling factor that ensures that ^xeTj g'{x)^ equals exactly y8? /4. 

3. We define gjix) = max{0, g'j{x) - Y.k<j 

Note that the second step ensures that g'j{x) < \g{x)\, while the third step ensures that 
gi{x) + • ■ • + gjix) < g'j{x) for all j, and in particular g\^{x) + ■ ■ ■ + gj{x) < \g{x)\. Hence the 
only thing left to prove is the following: 

Claim 8.5. ^xevgM^ > m,v/(10c)^''2 

Proof. Recall that for every /, a,- - ^i^xeUi fi^Y ^ and hence (using f{x) € [J3i,cPi) for 
X € Ui): 

fii/^l < ai ^ IJiC^/3l . (8.3) 

Now fix r = Tj. Since Exev Gji^)'^ is at least (in fact equal) iJ.{T) ^xeT Gjix)^ and 
^i{T) = eiJ.{Uij), we can use (8.3) and BxeT gjix)'' > (.Exergjix)^)'^^^, to reduce proving the 
claim to showing the following: 

E gjixf > (cA,)2/(i0c2) = /il/lO . (8.4) 

XE.T ■' 

We know that ExeT g'jixf = 0f /4. We claim that (8.4) will follow by showing that for 
every k < j, 

E g[{xf < m~'' ■ pl/4 , (8.5) 

xeT J 

where /' = - ij. (Note that /' > since in our construction the indices /i, . . . , ij are sorted 
in descending order.) 

Indeed, (8.5) means that if we let momentarily denote yJ^xeT gjix)^ then 

DO 

ll^yll > Wg'jW - \\i:k<jgk\\ > \\g'^\ - J^H^^H > H^^IKi - 2 1^"''^ ^ ^-^H^iH • ^^-^^ 

k<j i'=i 

The first inequality holds because we can write gj as g'j - hj, where hj = minlg'^., Yjk<j gk)- 
Then, on the one hand, \\gj\\ > ||gy|| - and on the other hand, < llli/t<y ^jtll since 
g'- > 0. The second inequahty holds because \^g^ < ||^^||. By squaring (8.6) and plugging 
in the value of H^'lP we get (8.4). 
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Proof of (8.5). By our construction, it must hold that 

c^Vj2'' < at. , (8.7) 

since otherwise the index ij would have been removed from the I at the k''^ step. Since 
/3ii, = /3ijC'' , we can plug (8.3) in (8.7) to get 

or 

Since \Ti\ = e\Ui\ for all /, it follows that \Tk\/\T\ < i2/cf''c-^. On the other hand, we 
know that E.^gr, g[{xf = p^JA = c^''/3y4. Thus, 

and now we just choose c sufficiently large so that > 100. □ 

□ 

9 Relating the 2-to-4 norm and the injective tensor norm 

In this section, we present several equivalent formulations of the 2-to-4 norm: 1) as the 
injective tensor norm of a 4-tensor, 2) as the injective tensor norm of a 3-tensor, and 3) as 
the maximum of a linear function over a convex set, albeit a set where the weak membership 
problem is hard. Additionally, we can consider maximizations over real or complex vectors. 
These equivalent formulations are discussed in Section 9.1. 

We use this to show hardness of approximation (Theorem 2.5) for the 2-to-4 norm 
in Section 9.2, and then show positive algorithmic results (Theorem 2.3) in Section 9.3. 
Somewhat surprisingly, many of the key arguments in these sections are imported from the 
quantum information literature, even though no quantum algorithms are involved. It is an 
interesting question to find a more elementary proof of the result in Section 9.3. 

We will generally work with the counting norms ||.||, defined as := (Z, |.^,|^)^^^, 
and the counting inner product, defined by (x,y) := x*y, where * denotes the conjugate 
transpose. 

9.1 Equivalent maximizations 

9.1.1 Injective tensor norm and separable states 

Recall from the introduction the definition of the injective tensor norm: if Vi,. . . are 
vector spaces with T e Vi ® • • • ® V,-, then ||r||inj = maK{\(T,{xi • • • ® Xr))\ : xi e 
S{Vi), . . . ,Xr £ S(V,.)), where S{V) denotes the L2-unit vectors in a vector space V. In 
this paper we use the term "injective tensor norm" to mean the injective tensor norm of 
£2 spaces, and we caution the reader that in other contexts it has a more general meaning. 
These norms were introduced by Grothendieck, and they are further discussed in [Rya02]. 

We will also need the definition of separable states from quantum information. For a 
vector space V, define L{V) to be the linear operators on V, and define D{V) := |p € L{V) : 
p > 0, Trp = 1) = conv{i;i;* : v € S(V)) to be the density operators on V. The trace induces 



30 



an inner product on operators: {X, Y) :- TrX*Y. An important class of density operators 
are the separable density operators. For vector spaces V\,..., V,-, these are 



IfV^Vi ^ ■■■ ^ Vr, then let Sep'XV) denote SepCVi, . . . , V,.). Physically, density opera- 
tors are the quantum analogues of probability distributions, and separable density operators 
describe unentangled quantum states; conversely, entangled states are defined to be the set 
of density operators that are not separable. For readers familiar with quantum information, 
we point out that our treatment differs principally in its use of the expectation for norms and 
inner products, rather than the sum. 

For any bounded convex set K, define the support function of K to be 



Define e,- € F" to be the vector with 1 in the / position. Now we can give the convex- 
optimization formulation of the injective tensor norm. 

Lemma 9.1. Let V\, . . . ,Vr be vector spaces with := dim V,-, and T e Vi (8 • • • ® V^- 
Choose an orthonormal basis e\,. . . ,en,. for Vr- Define T\,. . ., Tn^ € V\ ® • • ■ ® Vn by 
T = Ti ® Ci and define M e L{Vi ® '• • • ® K_i) by M ^ ^'^J- '^^^^ 



Observe that any M > can be expressed in this form, possibly by padding n^ to be 
at least rank M. Thus calculating || • lli^j for r-tensors is equivalent in difficulty to comput- 
ing hggp,-! for p.s.d. arguments. This argument appeared before in [HMIO], where it was 
explained using quantum information terminology. 

It is instructive to consider the r - 2 case. In this case, T is equivalent to a matrix f and 
IIT'llinj = I|r|l2^2- Moreover Sep^IF"') = D(F"i) is simply the convex hull of vv* for unit 
vectors v. Thus ^sep\¥"i )^^^ simply the maximum eigenvalue of M = TT*. In this case. 
Lemma 9.1 merely states that the square of the largest singular value of f is the largest 
eigenvalue of TT* . The general proof follows this framework. 

Proof of Lemma 9.L 




\vk{x) ■= max\(x,y)\. 



l|r||^j=hsep(V,,...,V,_0(^)- 



(9.1) 



liniini = 



max 

xieS{Vi),...,x,eS(V,) 



\{T,Xi®---®Xr)\ 



(9.2) 




(9.3) 



n 




(9.4) 



Therefore 




(9.5) 




(9.6) 
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max < y TiT*, xixt » • • • ® (9.7) 

i=l 

= hsep(y,...,v,,)|gr,T;j (9.8) 



In what follows, we will also need to make use of some properties of symmetric tensors. 
Define Sk to be the group of permutations of \k\ and define P„(7r) e L((F")®'^) to be the 
operator that permutes k tensor copies of F" according to n. Formally, 

r 

Pn{n):= J] ^e,el. (9.9) 

ii,...JrS[d] k=l 

Then define v^F" to be the subspace of vectors in (F")®'' that are unchanged by each P,i(7r). 
This space is called the symmetric subspace. A classic result in symmetric polynomials 
states that v'F" is spanned by the vectors {y®'' : y € F").'^ 

One important fact about symmetric tensors is that for injective tensor norm, the vectors 
in the maximization can be taken to be equal. Formally, 

Fact 9.2. IfTe VF" then 

- max^|<r,^«'->|. (9.10) 

This has been proven in several different works; see the paragraph above Eq. (3.1) of 
[CKPOO] for references. 

9.1.2 Connection to the 2-to-4 norm 

Let A = Yj'iLi ^i^] ^ so that a\, . . . ,am £ R" are the rows of A. Define 

m 

A^ = Y^af €(RT^ (9.11) 

(=1 

m 

A3 - 2 ^' ® ® e R" ® R" ® R'" (9. 12) 

(=1 

m 

A2,2 = Y_i ® "''^r ^ L((R")^2) (9.13) 

1=1 

The subscripts indicate that that A^ is an r-tensor, and Ar^s is a map from r-tensors to 5^- 
tensors. 

Further, for a real tensor T € (R")®'^, define ||r||inj[c] to be the injective tensor norm that 
results from treating T as a complex tensor; that is, max|Kr, ;ci • • • Xr)\ : xi,. . .,Xr e 
S(C")1. For r>3, WTW^nm can be larger than ||r||inj by as much as V2 [CKPOO]. 

Our main result on equivalent forms of the 2^4 norm is the following. 

Lemma 9.3. 

11^112^4 ^ ll^4llinj = ll^3ll?nj = ll^4llinj[C] ^ ll^3ll?nj[C] = hsep2(R")(^2,2) = hsep2(C")(^2,2) 

'^For the proof, observe that u*"^ 6 v'F" for any v eW". To construct a basis for V'F" out of hnear combina- 
tions of different t;®'', let zi, . . . , z„ be inde terminates and evaluate the r-fold derivatives of (ziei + • • • + z„e„)*'^ 
at zi = • • • = z„ = 0. 
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Proof. 

m 

\\A\\\^,= max y<a„x)4 (9.14) 

x£S(R") 

1=1 

= max <A4,x®'^> (9.15) 

xeS(R") 

= max KA4,xi ® X2 ® X3 ® X4)| (9.16) 

X|,X2,Jt3,-V4eS(R'") 

- l|A4lli„j (9.17) 

Here (9.16) follows from Fact 9.2. 

Next one can verify with direct calculation (and using maXjgS(R") (v, z) = Ht^lb) that 

max {A4, X ) = max (A22,^x (gixx )= max max {A3, x (g) x (g) z) . (9.18) 

.veS(R") A-eS(R") ' jc6S(R") zeS(R'") 

Now define z(/) := (e,-,z) and continue. 

m 

max max |(A3, x ® x ® z)| = max max Re > z(i){ai,x)'^ (9.19) 

A:eS(R") zeS(R"') xeS(R") zeS(R"') 

(=1 
m 

= max max Re > z(i)(ai,x)^ (9.20) 

xeS(R")zeS(C'") 

1=1 

in 

= max llV z(0«/flril2^2 (9.21) 

zeS(C"') 

(=1 



= max max Re > z.(i)(x* ,ai)(ai,u) (9.22) 

zeS{V) x,yeSiV) ^ 
= l|A3llinj[C] = IIAsllinj (9-23) 



From Lemma 9.1, we thus have ||A||2^4 - hsgp2(g„j(A2,2) - h^gpa^j.,,) (A2,2). 

To justify (9.22), we argue that the maximum in (9.21) is achieved by taking all the z(0 
real (and indeed nonnegative). The resulting matrix ^, z.ii)aiaj is real and symmetric, so 
its operator norm is achieved by taking x = y tobe real vectors. Thus, the maximum in 
ll^sllmjra is achieved for real x, y, z and as a result ||A3||inj[c] = WAiWi^y 

Having now made the bridge to complex vectors, we can work backwards to establish 
the last equivalence: ||A4||inj[c]. Repeating the argument that led to (9.17) will establish that 

llmj[C]- 



l|A4llinj[C] = maX;,gS(C") max^gS(C'") KA3, x » x » z>P = l|A3llfnj[c]- ^ 



9.2 Hardness of approximation for the 2-to-4 norm 

This section is devoted to the proof of Theorem 2.5, estabhshing hardness of approximation 
for the 2-to-4 norm. 

First, we restate Theorem 2.5 more precisely. We omit the reduction to when A is a 
projector, deferring this argument to Corollary 9.9, where we will further use a randomized 
reduction. 

Theorem 9.4. ( restatement of Theorem 2.5) Let (f> be a 3 -SAT instance with n variables and 
0{n) clauses. Determining whether (p is satisfiable can be reduced in polynomial time to 
determining whether ||A||2-^4 ^ C or ||A||2^4 < c where < c < C and A is an mxm matrix. 
This is possible for two choices of parameters: 
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1. m - poly(«), and C/c > 1 + 1/npoly log(«); or, 

2. m - exp( V^poly log(«) log(C/c)). 

The key challenge is establishing the following reduction. 

Lemma 9.5. Let M e L(C" ® C") ^a^w/y < M < /. Assume that either (case Y) 
hsep(n,n)(-W) = 1 fcfl'i'e A^j hsep(n,n)(M) < 1 - 5. Lgf k be a positive integer Then 
there exists a matrix A of size n^^ X n^^ such that in case Y, ||A||2-^4 - 1, and in case N, 
\\A\\2^4 ^ (1 - S/2f. Moreover, A can be constructed efficiently from M. 

Proof of Theorem 9.4. Once Lemma 9.5 is proved, Theorem 2.5 follows from previously 
known results about the hardness of approximating hsep)- Let be a 3-SAT instance with n 
variables and 0{n) clauses. In Theorem 4 of [GNN] (improving on earlier work of [Gur03]), 
it was proved that can be reduced to determining whether hsep(«',«')(M) is equal to 1 
("case Y") or < 1 - 1/n log' («) ("case N"), where c > is a universal constant, and M is an 
efficiently constructible matrix with < M < /. Now we apply Lemma 9.5 with k = I to 
find that exists a matrix A of dimension poly(n) such that in case Y, ||A||2^4 = 1, and in case 
N, ||A||2_>4 < 1 - l/2«log'^(«). Thus, distinguishing these cases would determine whether 
is satisfiable. This establishes part (1) of Theorem 2.5. 

For part (2), we start with Corollary 14 of [HMIO], which gives a reduction from de- 
termining the satisfiability of to distinguishing between ("case Y") hsep(m,m)(^) = 1 and 
("case N") hsep(m,m)(^) < 1/2. Again < M < /, and M can be constructed in time poly(m) 
from (p, but this time m = exp( V^poly log(?i)). Applying Lemma 9.5 in a similar fashion 
completes the proof. □ 

Proof of Lemma 9.5. The previous section shows that computing ||A||2^4 is equivalent to 
computing hsep(,i,n)('42,2), for A2,2 defined as in (9.13). However, the hardness results of 
[Gur03, GNN, HMIO] produce matrices M that are not in the form of A2,2- The reduction 
of [HMIO] comes closest, by producing a matrix that is a sum of terms of the form xx*®yy*. 
However, we need a sum of terms of the form xx* ® xx* . This will be achieved by a variant 
of the protocol used in [HMIO]. 

Let Mo € L(C"(EiC") satisfy < M < /. Consider the promise problem of distinguishing 
the cases hsep(«,«)(Mo) = 1 (called "case Y") from hsep(n,n){Mo) < 1/2 (called "case N"). We 
show that this reduces to finding a multiplicative approximation for ||A||2^4 for some real A 
of dimension n" for a constant a > 0. Combined with known hardness-of-approximation 
results (Corollary 15 of [HMIO]), this will imply Theorem 2.5. 

Define P to be the projector onto the subspace of (C")'^ that is invariant under f „((1, 3)) 
and P„((2, 4)) (see Section 9. 1 for definitions). This can be obtained by applying /'„((2, 3)) 
to V^C" ® V^C", where we recall that V^C" is the symmetric subspace of (C")^^. Since P 
projects onto the vectors invariant under the 4-element group generated by P„((l,3)) and 
P„((2, 4)), we can write it as 

„_ / + P,((1,3)) / + P,((2,4)) 

2 ■ 2 ■ ^ ^ 

An alternate definition of P is due to Wick's theorem: 

P = E [aa* (g) bb*^ aa* ®bb*], (9.25) 

a,b 

where the expectation is taken over complex-Gaussian-distributed vectors a,b € C" normal- 
ized so that E||a||2 - ^\\b\\2 - n/ V2. Here we use the notation (g) to mark the separation 
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between systems that we will use to define the separable states Sep(«^, n^). We could equiv- 
alendy write P - Ba^h[{aa* ® bb*)^^]. We will find that (9.24) is more useful for doing 
calculations, while (9.25) is helpful for converting Mq into a form that resembles A22 for 
some matrix A. 

Define Mi = ( VM)® VM))^( VM)), where VM) is taken to be the unique 

positive-semidefinite square root of Mq. Observe that 

Ml - E [vaj,vl Va,tvl J = E [vfll (9.26) 

a,b a,b 



where we define Va,b '■= 's/Mo{a (g) b) and Va,b '■= Va^bV*^^- We claim that hsep(Mi) gives a 
reasonable proxy for hsep(Mo) in the following sense. 



Lemma 9.6. 



in case Y 

hsep(«2y)(Mi) (9.27) 
' " ^ ■ 5/2 in case N. 



The proof of Lemma 9.6 is deferred to the end of this section. The analysis is very 
similar to Theorem 13 of [HMIO], but the analysis here is much simpler because Mq acts 
on only two systems. However, it is strictly speaking not a consequence of the results in 
[HMIO], because that paper considered a slightly different choice of Mi. 

The advantage of replacing Mq with Mi is that (thanks to (9.25)) we now have a matrix 
with the same form as A2,2 in (9.13), allowing us to make use of Lemma 9.3. However, we 
first need to amplify the separation between cases Y and N. This is achieved by the matrix 
M2 :- Mf^. This tensor product is not across the cut we use to define separable states; in 
other words: 

M2 = E [{Va,M ® • • • ® Va,Mf\ (9-28) 

ai,...,af^ 
bi,...M 

Now Lemma 12 from [HMIO] implies that hsgp(„2* „2*)(M2) = hsgp(„2 „2)(Mi)*^. This is either 
1 or < (3/4)^, depending on whether we have case Y or N. 

Finally, we would like to relate this to the 2^4 norm of a matrix. It will be more con- 
venient to work with Mi , and then take tensor powers of the corresponding matrix. Naively 
applying Lemma 9.3 would relate hsep(Mi) to ||A||2^4 for an infinite-dimensional A. In- 
stead, we first replace the continuous distribution on a (resp. b) with a finitely-supported 
distribution in a way that does not change E^ aa* ® aa* (resp. E^ bb* ® bb*). Such distri- 
butions are called complex-projective (2,2)-designs or quantum (state) 2-designs, and can 
be constructed from spherical 4-designs on [AE07]. Finding these desi ens is chal- 
lenging when each vector needs to have the same weight, but for our purposes we can use 
Caratheodory 's theorem to show that there exist vectors zi,- ■ ■,Zm with m - n^ such that 

n_aa* ®aa*^=Y, ZiZ* ® ZiZ* . (9.29) 

i'e [m] 

In what follows, assume that the average over a,b used in the definitions of P,Mi,M2 is 
replaced by the sum over zi,- ■ .,Zm- By (9.29) this change does not affect the values of 
P,Mi,M2. 

For i,j € [m], define Wjj := ^jMoizi (8 zj), and let eij := et ® ej. Now we can apply 
Lemma 9.3 to find that hsep(Mi) = ||Ai||2^4, where 
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The amplified matrix M2 similarly satisfies hsgp(„2t „2/t)(M2) - ||A2||2_>4, where 



12 - 



iu...Jk,j\,-;jkS[m] 



The last step is to relate the complex matrix A2 to a real matrix A3 with the same 2^4 
norm once we restrict to real inputs. This can be achieved by replacing a single complex 
entry a + with the 6x2 real matrix 



1 

v! 



' 1 


1 ^ 


1 


-1 


21/4 





21/4 








21/4 


. 





a -/3 
/3 a 



A complex input x + iy is represented by the column vector . The initial 2x2 matrix 



maps this to the real representation of {a + ifi){x + iy), and then the fixed 6x2 matrix maps 
this to a vector whose 4-norm equals \{a + iP){x + iy)\^. 

□ 

We conclude with the proof of Lemma 9.6, mostly following [HMIO]. 

Proof. Case Y is simplest, and also provides intuition for the choices of the M\ construction. 
Since the extreme points of Sep(n, n) are of the form xx* ® yy* for x,y e S(C"), it follows 
that there exists x,y e S(C") with {x ® y,M{x (gi y)) = 1. Since M < /, this implies that 
M{x ®y) = (a- ® y). Thus yfM^{x ®y) = {x® y). Let 



Z = X® y ® X® y. 

Then z is an eigenvector of both VM) ® and P, with eigenvalue 1 in each case. To 
see this for P, we use the definition in (9.24). Thus {z,M\z) - 1, and it follows that 
hsep(„2 „2)(Mi) ^ 1. On the other hand. Mi < /, implying that lisep(n2,„2)(Mi) < 1. This 
establishes case Y. 

For case N, we assume that hsep(n,n)(Mo) < 1 - 5 for any x,y e S(C"). The idea of 

2 

the proof is that for any x,y € S(C"'), we must either have x, y close to a product state, in 
which case the VM) step will shrink the vector, or if they are far from a product state and 
preserved by V^Wq ® '^Mq, then the P step will shrink the vector. In either case, the length 
will be reduced by a dimension-independent factor. 

2 

We now spell this argument out in detail. Choose x,y e S(C" ) to achieve 

s:={x® y, My{x ® y)) = hsep(„2^„2)(Mi). (9.30) 
Let X,Ye L(C") be defined by 

^Im^x-. Yj^iJ^^i^^j and yfM^y J] ^U^'^^j (^.31) 

i,je[n] i,Mn] 

Note that (X, X) = (x, Mqx) < 1 and similarly for (Y, Y). We wish to estimate 

s = ^''J Yk'fXijYk,i{er ® ey ® et ® er , P{ei ® ®ek® e/)) (9.32) 
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Using (9.24) we see that the expression inside the ( • ) is 



(9.33) 



Rearranging, we find 



{X, X){Y, Y) + {X, Y){X, Y) + {YY*, XX*) + {Y*Y, X*X) 
s = . (9.34) 

Using the AM-GM inequality we see that the maximum of this expression is achieved when 
X = Y,m which case we have 

{X, Xf + (X*X, X*X) 1 + (X*X, X*X) 
s - ■ < -. (9.35) 

Let the singular values of X be cri ^ • • • > o"„. Observe that ||cr||2 - {X,X) < 1, and thus 
II0-II4 ^ {X*X,X*X) < 0-2. On the other hand, 

cr2= max \(a,Xb)\^ (9.36) 

= max \{a ® b, V^^x)!^ (9.37) 

= max \{JM'Q{a®b),x)\^ (9.38) 

£/,i)eS(C") 



= max < VM)(a ® b), ^jMoia » b)) (9.39) 

a,beS{C") 

= max {a®b,MQ(a®b)) (9.40) 

= hsep(n.„)(Mo) < 1 - (5 (9.41) 



Remark: It is possible to extend Lemma 9.5 to the situation when case Y has 
hsepiM) > I - S' for some constant 6' < 6. Since the details are somewhat tedious, and 
repeat arguments in [HMIO], we omit them here. 

9.2.1 Hardness of approximation for projectors 

Can Theorem 2.5 give any super-polynomial lower bound for the SSE problem if we assume 
the Exponential-Time Hypothesis for 3-SAT? To resolve this question using our techniques, 
we would like to reduce 3-SAT to estimating the 2 — > 4 norm of the projector onto the 
eigenvectors of a graph that have large eigenvalue. We do not know how to do this. However, 
instead, we show that the matrix A constructed in Theorem 2.5 can be taken to be a projector. 
This is almost WLOG, except that the resulting 2^4 norm will be at least 3'^"^. 

Lemma 9.7. Let A be a linear map from R*^ to R" and 0<c<C,£>0 some numbers. 
Then there is m - 0{n^/E^) and a map A' from R'^ to R'" such that cr^iniA') ^ 1 - e and (i) 
i/l|A||2^4 < c then \\A'\\2^4 < ^^'^ + £, (U) \\A\\2^4 > C then ||A'||2^4 > ^{sC/c). 

Proof. We let S be a random map from R*^ to ]R0(n^/c5-) ^j^j^ entries that are i.i.d. Gaussians 
with mean zero and variance 1 / V^. Then Dvoretzky's theorem [Pis99] implies that for 

every / . R', ||B/|U . 3"\l ±d)ll/lh. Consider *e operB.or A' - V\ ,ha, maps / i„,o .he 



37 



concatenation of Af and Bf. Moreover we take multiple copies of each coordinate so that 
the measure of output coordinates of A' corresponding to A is a = while the measure 
of coordinates corresponding to B is 1 - a. 

Now for every function /, we get that ||A'/ll4 = «II^/Il4 + (1 - Q')l|6/ll4- In particular, 
since e 3(1 ± (5)||/||^, we get that if / is a unit vector and ||A/||^ < c"* then ||A7||^ < 

6^'^ + 3(1 + 6), while if \\Af\\l > C^, then HA'/H^ > 5{Cicf. 

Also note that the random operator B will satisfy that for every function /, ||B/||2 > 
(1 -(5)ll/ll2, and hence ||A7|| > (1 -a)(l -S)\\f\\. Choosing 5 = e/2 concludes the proof. □ 

It turns out that for the purposes of hardness of good approximation, the case that A is 
a projector is almost without loss of generality. 

Lemma 9.8. Suppose that for some e>0, C>l+e there is a poly(?i) algorithm that on 
input a subspace V c R" can distinguish between the case (Y) ||nv||2^4 ^ C and the case 
(N) ||nv||2^4 < 3^^^ + e, where IVy denotes the projector onto V. Then there is 6 - D.{e) 
and a poly(«) algorithm that on input an operator A : ^ R" with cr,^in(A) ^1—6 can 
distinguish between the case (Y) ||A||2^4 > C(l + 6) and (N) ||A||2^4 < 3^/4(1 + S). 

Proof. First we can assume without loss of generality that ||A||2^2 = crmax(A) < 1 + (J, since 
otherwise we could rule out case (N). Now we let V be the image of A. In the case (N) we 
get that that for every / e R*^ 

IIA/IU < 3^/^l + 5)||/||2 < 3"\\ + 6)\\Af\\2/(r^in{A) < 3"\l + OiSMAfh , 

implying ||ny||2^4 < 3^^^ + 0(6). In the case (Y) we get that there is some / such that 
IIA/IU > C(l + 5)||/||2, but since ||A/||2 < cr„,ax(A)||/||2, we get that ||A/||4 > C, implying 

l|ny||2-.4 >C. □ 

Together these two lemmas effectively extend Theorem 2.5 to the case when A is a 
projector. We focus on the hardness of approximating to within a constant factor. 

Corollary 9.9. For any t,e > 0, if (p is a 3-SAT instance with n variables and 0{n) 
clauses, then determining satisfiability of (p can be reduced to distinguishing between 
the cases ||A||2^4 < 3^^^ + s and ||A||2^4 ^ €), where A is a projector acting on 
m = exp( V«poly log(?2) log(^/e)) dimensions. 

Proof. Start as in the proof of Theorem 2.5, but in the application of Lemma 9.5, take 
k - 0{log{£/E)). This will allow us to take C/c = Q.{{/e) in Lemma 9.7. Translating into a 
projector with Lemma 9.8, we obtain the desired result. □ 

9.3 Algorithmic applications of equivalent formulations 

In this section we discuss the positive algorithmic results that come from the equivalences 
in Section 9.1. Since entanglement plays such a central role in quantum mechanics, the 
set Sep^(C") has been extensively studied. However, because its hardness has long been 
informally recognized (and more recently has been explicitly established [Gur03, LiuOV, 
HMIO, GNN]), various relaxations have been proposed for the set. These relaxations are 
generally efficiently computable, but also have limited accuracy; see [BSIO] for a review. 

Two of the most important relaxations are the PPT condition and ^-extendability. For 
an operator X e L((C")^' ) and a set S c [r], define the partial transpose X^'^ to be the result 
of applying the transpose map to the systems S . Formally, we define 

r 

(Xi®---®X,/^ -(^fkiXk) 

k=l 
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fk{M) := 




iik i S 
ifk € S 



and extend Ts linearly to all of L((C"f). One can verify that if X e Sep'CC") then > 
for all S c [r]. In this case we say that X is PPT, meaning that it has Positive Partial 
Transposes. However, the converse is not always true. If « > 2 or r > 2, then there are 
states which are PPT but not in Sep [HHH96]. 

The second important relaxation of Sep is called r-extendability. To define this, we need 
to introduce the partial trace. For 5 c [r], we define Tr^ to be the map from L((C")^'^) to 
L((C")®''~''^') that results from applying Tr to the systems in S . Formally 

r 

Tr5(g)X,-]~[TrX, (g)X,, 

k=l keS kiS 

and Tr5 extends by linearity to all of L((C")®'')- 

To obtain our relaxation of Sep, we say that p e D(C" (8 C") is r-extendable if there 
exists a symmetric extension cr e D(C" ® v''C") such that Trj3_ ,.+1) cr = p. Observe that 
if p e Sep^(C"), then we can write p = X,- x;x* (g) yiy*, and so cr = Yji ^i^* ® iyiy*T'^ is a 
valid symmetric extension. Thus the set of /c-extendable states contains the set of separable 
states, but again the inclusion is strict. Indeed, increasing k gives an infinite hierarchy of 
strictly tighter approximations of Sep^(C"). This hierarchy ultimately converges [DPS04], 
although not always at a useful rate (see Example IV. 1 of [CKMR07]). Interestingly this 
relaxation is known to completely fail as a method of approximating Sep^(R") [CFS02], but 
our Lemma 9.3 is evidence that those difficulties do not arise in the 2 — >4-norm problem. 

These two relaxations can be combined to optimize over symmetric extensions that 
have positive partial transposes [DPS04]. Call this the level-r DPS relaxation. It is known 
to converge in some cases more rapidly than r-extendability alone [NOP09], but also is 
never exact for any finite r [DPS04]. Like SoS, this relaxation is an SDP with size n'^'-'K In 
fact, for the case of the 2 — > 4 norm, the relaxations are equivalent. 

Lemma 9.10. When the level-r DPS relaxation is applied to A22, the resulting approxima- 
tion is equivalent to Tensor-SDP'^^''"^^^ 

Proof. Suppose we are given an optimal solution to the level-r DPS relaxation. This can 
be thought of as a density operator cr e 2)(C" ® V' C") whose objective value is /I := 
<A2,2,Tr(3,...,,+i,(r) = {A22 ® C"^c^>■ Let Ilf^^ := (/ + P„((l,2)))/2 be the orthogonal 
projector onto V^C". Then A22 - rflyL'^2,2nsym- Thus, we can replace cr by cr' := (Jl^^y^ ® 
/f ~')cr(nsy^in (g/f without changing the objective function. However, unless cr' = cr, 
we will have Trcr' < 1. In this case, either cr' - and /I = 0, or cr'/Trcr' is a solution 
of the DPS relaxation with a higher objective value. In either case, this contradicts the 
assumption that A is the optimal value. Thus, we must have cr = cr' , and in particular 
supper c v^C" ® (C")'^'^"^ Since we had supper c C" » VC" by assumption, it follows 
that 

supp o- c (v^c" ® (Cf-') n (C" ® vT") - v'"+iC" 

Observe next that cr^ is also a valid and optimal solution to the DPS relaxation, and so 
cr = {cr + cr^)/2 is as well. Since cr' is both symmetric and Hermitian, it must be a real 
matrix. Replacing cr with cr', we see that we can assume WLOG that cr is real. 

Similarly, the PPT condition implies that cr^-* ^ 0. (Recall that the first system is 
A and the rest are Bi, . . . ,8^.) Since the partial transpose doesn't change the objective 
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function, cr' = {cr + cr^)/! is also an optimal solution. Replacing cr with cr', we see that 
we can assume WLOG that cr = cr^^. Let cr e (]R«)^2r+2 (jgjjote the flattening of cr; i.e. 
{x ® ij,(f) = {x, cry) for all x,y e (R")''^^. Then the fact that cr = cr^'^ means that cf is 
invariant under the action of P„((l, r + 1)). Similarly, the fact that supp cr c v''^'R" implies 
that a e v''+^R" g) V''+^R". Combining these two facts we find that a e V^''+^W. 

Now that (f is fully symmetric under exchange of all 2r + 2 indices, we can interpret it 
as a real-valued pseudo-expectation Eo- for polynomials of degree 2r -i- 2. More precisely, 
we can define the linear map coeff that sends homogeneous degree-2r -i- 2 polynomials to 
y2r+2j^« action on monomials: 

coeff ifr ■ ■ ■ f:-) ■■= n%'-^%T' ® • • • ® ^r-), (9.42) 

where Hsym^^ '■= j^TtJ. ^neSir+i ^n(^)- For ^ homogenous polynomial Q{f) of even degree 
2r' < 2r -F 2 we define coeff by 

coeffiQif)) := coeff{Q{f) ■ ll/llf ^'~''"'). 

For a homogenous polynomial Qif) of odd degree, we set coeff (Q) := 0. Then we can 
extend coeff by linearity to all polynomials of degree < 2r -i- 2. Now define 

IE[!2] := {coeff iQXcf). 

cr 

We claim that this is a valid pseudo-expectation. For normalization, observe that E[l] = 
< coeff iWfWf''^), (?) = Tr cr = 1. Similarly, the Tensor-SDP constraint of ]E[(||/||^ - 1)2] = o 
is satisfied by our definition of coeff. Linearity follows from the linearity of coeff and 
the inner product. For positivity, consider a polynomial Q{f) of degree < r -i- 1. Write 
Q = Qo + Qe, where Qo collects all monomials of odd degree and Q^, collects all monomials 
of even degree (i.e. Q„ = (Qif) ± Q{-f))l2). Then £[2^] - UQI^ + MQ^J, using the 
property that the pseudo-expectation of a monomial of odd degree is zero. 

Consider first ]E[22]_ Let r' - 2[^J (i.e. r' is r -F 1 rounded down to the nearest 
even number), so that Qe = Yj'jJo Qn, where Q2i is homogenous of degree 2i. Define 
Qe ■■= Z -1? 22,11/112 Observe that Q'^ is homo genous of degree r' < r + I, and that 
= E[(2g)2]. Next, define coeff' to map homogenous polynomials of degree r' into 
V'' R" by replacing 2r -i- 2 in (9.42) with r'. If r' = r -i- 1 then define cr' = cr, or if r' = r 
then define cr' = Tr^ cr. Thus cr' acts on r' systems. Define cf' e V^'' R" to be the flattened 
version of cr'. Finally we can calculate 

E[e'] - mQ'ef] = < coeff' iQ'^) ® coeff' {Q',), cf') = < coeff' Q'„ cr' coeff' Q',) > 0. 

A similar argument establishes that E[22] ^ as well. This establishes that any optimal 
solution to the DPS relaxation translates into a solution of the Tensor-SDP relaxation. 

To translate a Tensor-SDP solution into a DPS solution, we run this construction in 
reverse. The arguments are essentially the same, except that we no longer need to establish 
symmetry across all 2r -i- 2 indices. □ 

9.3.1 Approximation guarantees and the proof of Theorem 2.3 

Many approximation guarantees for the ^-extendable relaxation (with or without the addi- 
tional PPT constraints) required that k be poly(n), and thus do not lead to useful algorithms. 
Recently, [BaCYll] showed that in some cases it sufficed to take k = 0{logn), leading 
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to quasi-polynomial algorithms. It is far from obvious that their proof translates into our 
sum-of-squares framework, but nevertheless Lemma 9. 10 implies that Tensor-SDP can take 
advantage of their analysis. 

To apply the algorithm of [BaCYl 1], we need to upper-bound A2,2 by an 1-LOCC mea- 
surement operator. That is, a quantum measurement that can be implemented by one-way 
Local Operations and Classical Communication (LOCC). Such a measurement should have 
a decomposition of the form X,- Wj where each Vj, Wj > 0, X,- Vi < /„ and each Wi < /„. 
Thus, for complex vectors v\, . . . ,Vm,wi, . . . ,Wm satisfying Yji ^i^* ^ h and V/, WiW* < In, 
the operator YjI ^i^* ® ^i^* is a 1-LOCC measurement. 

To upper-bound A2,2 by a 1-LOCC measurement, we note that ataj < ||a,il2/,i- Thus, if 
we define Z := || X, aiaJ\\2->2 max, ||a;|p, then A2,2/Z is a 1-LOCC measurement. Note that 
this is a stricter requirement than merely requiring A2,2/Z < I„2. On the other hand, in some 
cases (e.g. a,- all orthogonal), it may be too pessimistic. 

In terms of the original matrix A = Yji^i'^] ^ we have max, 11(2,112 = ||A||2_>oo- Also 
\\Y.iaia%^2 = l|A^A||2^2 - I|A||L2- Thus 

Z = ||A||^^2llA|lLco- 

Recall from the introduction that Z is an upper bound on ||A||2^4, based on the fact 
that ||;c||4 < ViWi2iWioo for any x. (This bound also arises from using interpolation of 
norms [Ste56].) 

We can now apply the argument of [BaCYll] and show that optimizing over 0{r)- 
extendable states will approximate ||A||2^4 up to additive error yj^^^-Z. Equivalently, we 
can obtain additive error sZ using 0(log(«)/e^)-round Tensor-SDP. Whether the relaxation 
used is the DPS relaxation or our SoS-based Tensor-SDP algorithm, the resulting runtime 
is exp((9(log2(«)/£2)). 

9.3.2 Gap instances 

Since Tensor-SDP is equivalent than the DPS relaxation for separable states, any gap in- 
stance for Tensor-SDP would translate into a gap instance for the DPS relaxation. This 
would mean the existence of a state that passes the ^-extendability and PPT test, but never- 
theless is far from separable, with A2,2 serving as the entanglement witness demonstrating 
this. While such states are already known [DPS04, BSIO], it would be of interest to find 
new such families of states, possibly with different scaling of r and n. 

Our results, though, can be used to give an asymptotic separation of the DPS hierarchy 
from the r-extendability hierarchy. (As a reminder, the DPS hierarchy demands that a state 
not only have an extension to r+ 1 parties, but also that the extension be PPT across any cut.) 
To state this more precisely, we introduce some notation. Define DPS,^ to be the set of states 
pAB which there exists an extension p^^i "^' y^jj]^ support in C" ® V''C" (i.e. a symmetric 
extension) such that p is invariant under taking the partial transpose of any system. Define 
Extr to be the set of states on AB with symmetric extensions to ABi . . . B,- but without any 
requirement about the partial transpose. Both hops,, and hnxt, can be computed in time 
although in practice hExt,{M) is easier to work with, since it only requires computing the 
top eigenvalue of M ® 7^'""' restricted to C" ® V'"C" and does not require solving an SDP. 

Many of the results about the convergence of DPS^ to Sep (such as [DPS04, CKMR07, 
KM09, BaCYl 1]) use only the fact that DPS,- c Extr. A rare exception is [NOP09], which 
shows that DPS^ is at least quadratically closer to Sep than Ext^ is, in the regime where 
r » n. Another simple example comes from M = OO*, where <1> is the maximally entangled 
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state n j Then one can readily compute that hsep(M) = hopsi (^) - 1 /n, while 

the r-extendible state 



achieves hExt,.(^) ^ I/''- (I^i words, (9.43) describes a state where A and a randomly 
chosen B, share the state OO* , while the other Bj systems are described by maximally mixed 
states.) This proves that the r-extendable hierarchy cannot achieve a good multiplicative 
approximation of hsep(M) for all M without taking r > Q.{n). 

Can we improve this when M is in a restricted class, such as 1-LOCC? Here 
[BRSdWll] show that the Khot-Vishnoi integrality construction can yield an n^- 
dimensional M for which hsep{M) < 0(1 /n), but TrMO ^ 0(1/ log^(?i)). Combined with 
(9.43) this implies that hextX^) > ^C^/rlog^in)). On the other hand. Theorem 6.12 and 
Lemma 9.10 implies that hDPS3(M) < 0(l/?i). Additionally, the M from Ref. [BRSdWl 1] 
belongs to the class BELL, a subset of 1-LOCC, given by measurements of the form 
Y^ijPUjAi ® Bj, with < pi^j < 1 and HjA,- = YijBj = I. As a result, we obtain the 
following corollary. 

Corollary 9.11. There exists an dimensional M e BELL such that 



10 Subexponential algorithm for the 2-to-q norm 

In this section we prove Theorem 2.1: 

Theorem (Restatement of Theorem 2.1). For every \ < c <C, there is a poly(?i) exp{n^l^)- 
time algorithm that computes a (c, C)-approximation for the 2 ^ q norm of any linear 
operator whose range is R". 

and obtain as a corollary a subexponential algorithm for Small-Set Expansion. The 
algorithm roughly matches the performance of [ABSlOJ's for the same problem, and in 
fact is a very close variant of it. The proof is obtained by simply noticing that a subspace 
V cannot have too large of a dimension without containing a vector v (that can be easily 
found) such that » ||ii||2, while of course it is always possible to find such a vector 
(if it exists) in time exponential in dim(y). The key observation is the following basic fact 
(whose proof we include here for completeness): 

Lemma 10.1. For every subspace V c R", || \/||2^oo > ^/dim{V). 

Proof. Let /\ ...,/'' be an orthonormal basis for V, where d = dim{V). For every / € [n], 
let g' be the function 'L'j=i fl f- Note that the coordinate of g' is equal to Yfj=\{fl)^ (*) 
which also equals \\g'\^ since the /^'s are an orthonormal basis. Also the expectation of (*) 
over / is ^iewifj)^ = 2^=ill/''ll2 - since these are unit vectors. Thus we get that 
Eill^'lloo > ^ig] = d = ]E;||g||2. We claim that one of the ^"s must satisfy ll^^'IU > V^ll^lb- 
Indeed, suppose otherwise, then we'd get that 




(9.43) 




d = mg'Wj > Emi/d 
meaning S;!!^'!!^ < but S/H^'H^ > (E;||^'||oo) - d^ — a contradiction. 



□ 
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Corollary 10.2. For every subspace V c R", ||V||2^^ > ^f^niy) I ' i 

Proof. By looking at the contribution to the ^'''-nonn of just one coordinate one can see 
that for every function /, ||/||, > {WfWllnfl^' = \\f\L/n'/'>. □ 

Proof of Theorem 2.1 from Corollary 10.2. Let A : R™ R" be an operator, and let 
1 < c < C be some constants and cr = (TminCA) be such that IIA/II2 > cr||/||2 for every / 
orthogonal to the kernel of A. We want to distinguish between the case that ||A||2-^ty < c and 
the case that ||A||2-»i, ^ C. If cr > c then clearly we are not in the first case, and so we are 
done. Let V be the image of A. If dim( V) < C^n^l'^ then we can use brute force enumeration 
to find out if such v exists in the space. Otherwise, by Corollary 10.2 we must be in the 
second case. □ 
Note that by applying Theorem 2.3 we can replace the brute force enumeration step by 
the SoS hierarchy, since ||V||2^2 ^ 1 automatically, and unless ||y||2^oo < Cn^^'i we will be 
in the second case. 

A corollary of Theorem 2.1 is a subexponential algorithm for Small-Set Expansion 

Corollary 10.3. For every 0.4 > v > there is an exp(«'''^('°g(^/^») time algorithm that 
given a graph with the promise that either (i) ^ 1 - v or (ii) Og(5^) < 0.5 decides 

which is the case. 

Proof. For q = 0(log(l/v)) we find from Theorem 2.4 that in case (i), ||V^o.4ll2^5 ^ 2/ V^, 
while in case (ii) ||y>o.4ll2-^<? > 0.1/6^-^''^. Thus it sufficies to obtain a (2/ ^/^,0.l/6^-^''l)- 
approximation for the 2 ^ q norm to solve the problem, and by Theorem 2.1 this can be 
achieved in time exp(?i'^^'°§^^^^'^) for sufficiently small S. □ 

Conclusions 

This work motivates further study of the complexity of approximating hypercontractive 
norms such as the 2^4 norm. A particulary interesting question is what is the complexity 
of obtaining a good approximation for the 2^4 norm and what's the relation of this prob- 
lem to the Small-Set Expansion problem. Our work leaves possible at least the following 
three scenarios: (!) both these problems can be solved in quasipolynomial time, but not 
faster, which would mean that the UGC as stated is essentially false but a weaker variant 
of it is true, (ii) both these problems are NP-hard to solve (via a reduction with polyno- 
mial blowup) meaning that the UGC is true, and (iii) the Small-Set Expansion and Unique 
Games problems are significantly easier than the 2^4 problem with the most extreme 
case being that the former two problems can be solved in polynomial time and the latter 
is NP-hard and hence cannot be done faster than subexponential time. This last scenario 
would mean that one can improve on the subexponential algorithm for the 2 — > 4 norm for 
general instances by using the structure of instances arising from the Small-Set Expansion 
reduction of Theorem 2.4 (which indeed seem quite different from the instances arising 
from the hardness reduction of Theorem 2.5). In any case we hope that further study of the 
complexity of computing hypercontractive norms can lead to a better understanding of the 
boundary between hardness and easiness for Unique Games and related problems. 
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A More facts about pseudo-expectation 

In this section we note some additional facts about pseudo-expectation functional that are 
useful in this paper. 

Lemma A.l. The relation < P holds if and only ifO < P < I. Furthermore, if P^ < P 
andO <Q<P, then < Q. 

Proof. If P >: 0, then P < I implies P^ < P. (Multiplying both sides with a sum of squares 
preserves the order.) On the other hand, suppose P^ < P. Since P^ > 0, we also have P > 0. 
Since \ -P^P-P^-\-{\- Pf, the relation P^ < P also imphes P <l. 

For the second part of the lemma, suppose P^ < P and < Q < P. Using the first part 
of the lemma, we have P < 1. It follows that < Q < \, which in turn implies < Q 
(using the other direction of the first part of the lemma). □ 

Fact A.2. If f is a d-f.r.v. over R'^ and {Pv]vett <^re polynomials of degree at most k, 
then g with g{v) - Pvif) is a level-{d / k) fictitious random variable over R*^. (For a poly- 
nomial Q of degree at most d/k, the pseudo-expectation is defined as Q{{(j{v)}vett) '■= 
^fQi{Pvif)}ve<u).) 



Lemma A.3. For f,g e L2('W), 



</,(?> ^^ii/ii' + ^imP 



211.7 11 

Proof. The right-hand side minus the LHS equals the square polynomial \{f - g, f -g) □ 

Lemma A.4 (Cauchy-Schwarz inequality). If (/, g) is a level-2 fictitious random variable 
over R*^ X R*^, then 

p(f,g)< Jfiii/ip. SmP- 

Proof Let / = // ^E/||/||2 and g = g/ ^E,||g|p. Note %||/|p = E^||^|p = 1. Since by 
Lemma A.3, (/, g) < V2II/IP + V2||^lP, we can conclude the desired inequality. 



f{f,g}= JEII/IP- /E|MpE(/,^>< /EII/IP- /E|Mp-|iE||/|p + 2 
f,g ^j f ^j g f-g >// >/9 Vf ^9 

=1 
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Corollary A.5 (Holder's inequality). If{f, g) is a 4-f.r.v. over R'^ X R*^, then 

I X^/^ / \3/4 
E ^f{u)g{uf<.mft\ m\g\t 



f^gue'U \f \cj 

Proof. Using Lemma A.4 twice, we have 

/ X 1/2 / X 1/2 / X 1/4 / ^^3/4 

E E < E E E|M|4 M\\f\fA E|M|^ 



B Norm bound implies small-set expansion 

In this section, we show that an upper bound on 2 — > 17 norm of the projector to the top 
eigenspace of a graph implies that the graph is a small-set expander. This proof appeared 
elsewhere implicitly [KV05, O'DOV] or explicitly [BGH^ 11] and is presented here only for 
completeness. We use the same notation from Section 8. Fix a graph G (identified with its 
normalized adjacency matrix), and A € (0, 1), letting V^a denote the subspace spanned by 
eigenfunctions with eigenvalue at least A. 

If jP,^ satisfy \lp + \lq= 1 then ||x||p = maXy:||y||^^i \{x,y)\. Indeed, \{x,y)\ < |U||p||i/||^ 
by Holder's inequality, and by choosing j/, = sign(x;)|.'c,|''~' and normalizing one can see 
this equality is tight. In particular, for every x e Litl), \\x\\q = maXy:||y||^^j^ ,^^1 \{x,y)\ and 
^ max||.v||,,s;i \{x,y)\. As a consequence 

||A||2^<; ^ max IIAxllg ^ max \{Ax,y)\^ max KA ^ x>| = ||A^||g/(g_i)^2 

Mi^i ll^ll2^i,lly||,/(,-i)<i llylWi)^! 

Note that if A is a projection operator, A = A^. Thus, part 1 of Theorem 2.4 follows 
from the following lemma: 

Lemma B.l. Let G - {V, E) be regular graph and A € (0, 1). Then, for every S QV, 

0(5) > 1 - i - ||V^||J/(,_i)^2//(5)(''-2)/^ 

Proof. Let / be the characteristic function of S , and write f - f + f" where /' e and 
f"-f- f is the projection to the eigenvectors with value less than A. Let ^ = /i(5). We 
know that 

(5(5) = 1 - {f,Gf)/\\f\\l - 1 - , (B.l) 

And ||/||,/(,-i) - (Ef(xr'(i-'^f~'^'' - meaning that ||/'|| < ||yall,/(?-i)^2//(^-')^''. 

We now write 

(/, Gf) = if, Gf) + if", Gf") < Wf'Wl + AWf'Wl < img/(,-i)^2ll/ll^/(,-i) + 

< mil-.y^'-'^^'+M. (B.2) 

Plugging this into (B.l) yields the result. □ 

C Semidefinite Programming Hierarchies 

In this section, we compare different SDP hierarchies and discuss some of their properties. 
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C.l Example of Max Cut 



In this section, we compare the SoS hierarchy and Lasserre hierarchy at the example of Max 
Cut. (We use a formulation of Lasserre's hierarchy similar to the one in [Sch08].) It will 
turn out that these different formulations are equivalent up to (small) constant factors in the 
number of levels. We remark that the same proof with syntactic modifications shows that 
our SoS relaxation of Unique Games is equivalent to the corresponding Lasserre relaxation. 

Let G be a graph (an instance of Max Cut) with vertex set V = {I, . . . ,n}. The level- 
d Lasserre relaxation for G, denoted lasSt;(G), is the following semidefinite program over 
vectors Msqm, \s\<ch 

lassrf(G): maximize ^ - f^ylP 

subject to (vs,vt) = (vs',vt') for all sets with 5Ar = S'AT' , 

The level-J SoS relaxation for G, denoted sos^(G), is the following semidefinite pro- 
gram over d-p.e.f. E (and d-f.T.v. x over R^), 

sosrf(G) : maximize E ^ (x,- - xj)^ 

(i,j)eG 

subject to E(xf - 1)^ ^ for all ieV. 

X 

From Lasserre to SoS. Suppose {vs } is a solution to lasSt/(G). For a polynomial P over 
R^, we obtain a multilinear polynomial P' by successively replacing squares by 1. (In 
other words, we reduce P modulo the ideal generated by the polynomials x^ - 1 with / e V.) 
We define a d-p.e.f. E by setting E P = X\SHd cs (vid, vs ), where {cs }\s\^d are the coefficients 
of the polynomial P' = Y,\s\^d Ylies obtained by making P multilinear. The functional 
E is linear (using (P + Q)' = P' + Q') and satisfies the normalization condition. We also 
have ]E(x^ - 1)^ = since (xj - 1)^ = modulo x^ - 1. Since Ev(x,- - xj)^ = \\vi - Vj\\^ for all 
/, j e V (using {vq, Vij) = {vi, vj)), our solution for soSd{G) has the same objective value as 
our solution for lassrf(G). It remains to verify positivity. Let P^ be a polynomial of degree 
at most d. We may assume that P is multilinear, so that P - Yj\swiCs^s Therefore P^ - 
Y^sjCsCtXsXt and EP^ = Y.s,t csCT{v(h,vsM)- Using the property <i;0,ysAr> = {vs,vt), 
we conclude Ep2 = csct{vs,vt) = II csf^slP > 0. 

From SoS to Lasserre. Let E be a solution to sosrf(G). We will construct a solution for 
lassrf/2(G) (assuming d is even). Let d' = d/2. For a € N", let x" be the monomial Ylie[n] ■ 
The polynomials {x")|q,|^^' form a basis of the space of degree-J' polynomials over R". 
Since EP^ > for all polynomials P of degree at most d' , the matrix i^x" ^\ai\p\^d' is 
positive semidefinite. Hence, there exists vectors Va for a with \a\ < d' such that Ex^x'^ = 
{va,vp). We claim that the vectors Va with a € {0, 1)" and \a\ < d form a solution for 
lassrf(G). The main step is to show that {v^, vp) depends only on o- + /3 mod 2. Since 
{"a, v/s) = Ex""^^, it is enough to show that E satisfies Ex^ = Ex^ ^. Hence, we want 
to show Ex^P = EP for all polynomials (with appropriate degree). Indeed, by Lemma 3.5, 

E(x2 - 1) • P < VE(x2 - 1)2 Vfep2 = 0. 
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